User Guide : Encoding Reference : Understanding Unicode and Other Encoding Types
Share this page             
Understanding Unicode and Other Encoding Types
Unicode is an international character set. Like other character sets such as American Standard Code for Information Interchange (ASCII), the Unicode character set provides a standard correspondence between the binary numbers that computers understand, and the letters, digits, and punctuation that people understand.
Unlike ASCII, however, Unicode provides a code for every character in nearly every language in the world. This task requires more than the 256 characters available in ASCII. ASCII is based on the 8-bit character set, while Unicode uses 16-bit characters as the default.
Unicode characters are most commonly referred by their 4-digit hexadecimal representations (0000 to FFFF). The numbers 0 (0000) to 128 (007F) correspond exactly to their ASCII counterparts. The correspondence between the integer values and the actual characters may be found at Unicode’s website.
Unicode includes the Latin alphabet used for English, the Cyrillic alphabet used for Russian, the Greek, Hebrew, and Arabic alphabets, and other alphabets used in Europe, Africa, and Asia, such as Japanese kana, Korean hangul, and Chinese bopomofo.
Much of the Unicode standard includes thousands of unified character codes for Chinese, Japanese, and Korean ideographs. Adopted as an international standard in 1992, Unicode was originally a "double-byte," or 16-digit, binary number code that could represent up to 65,536 items.