Character set encoding

Character set encoding refers to a set of characters and the way the way these characters are stored into memory. A coded character set is a character set in which each character corresponds to a unique number. The code unit size is equivalent to the bit measurement for the particular encoding.

  • A code unit in US-ASCII consists of 7 bits.
  • A code unit in UTF-8, EBCDIC and GB18030 consists of 8 bits.
  • A code unit in UTF-16 consists of 16 bits.
  • A code unit in UTF-32 consists of 32 bits.

Related terms

