Determining Field Width in Characters or Bytes

User Guide : Map Connectors : Additional Connectivity Details : Determining Field Width in Characters or Bytes

Share this page

Connectors specify their field width in either characters or bytes. Determining whether a connector uses characters or bytes is important in choosing an encoding set other than the default.

Field Width in Characters

Many connectors specify their field width in characters. This means that the width of a field is that number of characters. To determine the number of bytes of the field, both the encoding and the particular characters must be examined. For example, see the following notes:

• In UTF-8, a single character may be encoded into one, two, three, four or five bytes. Thus a five-character field is written as at least five bytes and at most 25 bytes.

• UCS-2 is literally a double-byte character set; characters take two bytes and UCS-2 can represent only the first Unicode plane.

• UTF-16 represents most existing characters as two bytes, however characters that do not appear in the first Unicode plane take up four bytes. Currently, UTF-16 is treated as UCS-2.

• Shift-JIS is a multibyte character set. In Shift-JIS, a character takes either 1 or 2 bytes, depending upon the character. Thus, a five-character wide field takes from 5 to 10 bytes.

Field Width in Bytes

Some connectors specify their field width in bytes. What actually varies is the number of characters that fit into a given field. For example, if a field width is 10 bytes and the encoding is Shift-JIS, the field can hold a minimum of 5 characters and a maximum of 10. This means that if you try to write 8 Japanese characters, data is truncated (the Kanji characters all take 2 bytes).