User Guide : Map Connectors : Source and Target Map Connectors : Unicode (Fixed)
 
Share this page             
Unicode (Fixed)
Unicode is a character set that uses 16 bits (two bytes) for each character and therefore is able to include more characters than ASCII. Unicode can have 65,536 characters and therefore can be used to encode almost all the languages of the world. Unicode includes the ASCII character set within it. With this fixed text connector, you can read and write Unicode files.
Unicode (Fixed) data can be described as any Unicode file that has no characters separating fields, records may or may not be separated, and where each record in the file occupies the same number of bytes.
Supported Encoding
For the list of supported encoding, see Binary (International) Unicode Support.
Property Options
You can set the following source (S) and target (T) properties.
Property
S/T
Description
ByteOrder
ST
Allows you to specify the byte order of Unicode (wide) characters. The default is Auto and is determined by the architecture of your computer. The list box options are Auto (default), Little Endian and Big Endian. Little Endian byte order is generally used by Intel machines and DEC Alphas and places the least significant portion of a byte value in the left portion of the memory used to store the value. Big Endian byte order is used by IBM 370 computers, Motorola microprocessors and most RISC-based systems and stores the values in the same order as the binary representation.
CharFieldWidths
ST
Allows you to set field width by number of characters or number of bytes. With MBCS characters, a character may take more than one byte; files may have columns fixed by number of characters regardless of the number of bytes or may have columns fixed by number of bytes (variable number of characters). If truncation occurs in a column, the last double-byte character is replaced by a single-byte padding character. The default is false. False sets the field width by number of bytes. true sets the field width by number of characters.
DatatypeSet
ST
Allows you to choose standard or COBOL data types in a Unicode (Fixed) data file. Standard is default and means that all data in the file is readable.
If your Unicode (Fixed) file contains (or needs, for target file) COBOL display type fields and you are using a COBOL 01 copybook (fd) to define the fields, you must change this property option to COBOL before connecting to the COBOL copybook in the External Structured Schema window.
Encoding
ST
Type of encoding to use with source and target files. For details, see Additional Information About Encoding.
FieldSeparator
T
Allows you to choose a field separator character for your target file. The default is None. The choices are None (default), coma, tab, space, carriage return-line feed, line feed, carriage return, line feed-carriage return, control-R, and pipe (|). If the alternate field separator is not one of the listed choices and is a printable character, see Alternate Tip on FieldSeparator Property.
Fill Fields
T
Allows writing a Unicode (Fixed) data file where every field is variable length. If this property is set to false, all trailing spaces are removed from each field when the data is written. The default is true. The true setting pads all fields with spaces to the end of the field length to maintain the fixed length of the records.
InsertEOFRecSep
S
This option inserts a record separator on the last record of the file, if it is missing. The default is false. If set to true, this property captures the last record (with no record separator) instead of discarding it.
NumericFormatNormalization
S
Setting this property to true handles thousands-separators according to usage for locale when numeric strings are converted to numeric type. This property overrides any individual field settings. Default is false.
Order Mark
T
The Order Mark is a special character value that is sometimes written to a Unicode text file to indicate the byte order used for encoding each of the Unicode characters. In the integration platform, you have the option of writing byte order mark at the beginning of Unicode (wide) output or not. The default is false. If you wish to have the byte order mark placed at the beginning of your output, change this option to true.
Ragged Right
T
Writes a data file where the last field in each record is variable length when set to true. The default is false. The false setting pads the last field with spaces to the end of the record length to maintain the fixed length of the records. You must set FillFields to false for the RaggedRight property to work properly. The Ragged Right property has no effect if you set FillFields to true. If FillFields is false, then the RaggedRight property determines whether blank fields and fields with only spaces as data appears at the end of the record.
RecordSeparator
ST
A Unicode (Fixed) file is presumed to have a carriage return-line feed (CR-LF) between records. To specify other characters to separate records, click RecordSeparator for a list of choices, including system default, carriage return-line feed (default), line feed, carriage return, line feed-carriage return, form feed, empty line, ctrl-E, and no record separator. To use a separator other than one from the list, enter it here. The SystemDefault setting enables the same transformation to run with CR-LF on Windows systems and LF on Unix systems without having to change this property.
If your field or record separator is not listed, highlight the default separator. Enter the characters you wish to use as a separator.
The Unicode connectors read the data from the file as Unicode and look for the Unicode characters specified as the separators to break the data up into fields or records. Then the actual Unicode data is assigned to fields or records.
Sample Size
S
Set the number of records (starting with record 1) that are analyzed to set a default width for each source field. The default is 1000. You can change the value to any number between 1 and the total number of records in your source file. As the number gets larger, more time is required to analyze the file, and it may be necessary to analyze every record to ensure no data is truncated. To change the value, click the Sample Size Current Value box, highlight the default value and type a new value.
StartOffset
S
If your source data file starts with characters that need to be excluded from the transformation, set the StartOffset option to specify at which byte of the file to begin. The default value is zero. The correct value may be determined by using the Hex Browser. For a list of the 256 standard and extended ASCII characters, search for "hex values" in the documentation. This property is set in number of bytes, not characters, regardless of the CharFieldWidths property setting.
StripLeadingBlanks
S
For a Unicode source file, by default, leading blanks are left in Unicode (Fixed) data. To delete leading blanks, click the StripLeadingBlanks Current Value box and click once. Then click the down arrow to the right of the box and click true.
StripTrailingBlanks
S
By default, trailing blanks are left in Unicode (Fixed) data. To delete trailing blanks, click the StripTrailingBlanks Current Value box and click once. Then click the down arrow to the right of the box and click true.
Tab Size
ST
If your source or target Unicode (Fixed) file has embedded tab characters representing white space, you can expand those tabs to set a number of spaces. The default value is zero.
Alternate Tip on FieldSeparator Property
If the alternate field or record separator is not listed
1. Highlight the default separator.
2. Enter the characters you wish to use as a separator.
The Unicode connectors read the data from the file as Unicode and look for the Unicode characters specified as the separators to break the data up into fields or records. Then the actual Unicode data is assigned to fields or records.
Additional Information About Encoding
You must be aware of the following regarding the Encoding property option:
Shift-JIS encoding is meaningful only in Japanese operating systems.
To display Chinese-Japanese-Korean-Vietnamese (CJKV) data in Data Browser
1. Verify that your operating system has at least one font available that corresponds to the specific character set and code page you want to use.
2. Select the Unicode connector and your encoding method in the source or target properties.
3. Go to the main menu and select View > Preferences > Fonts.
4. Choose a font that corresponds to your character set and encoding method.
For details on how to manually define the structure of fields and records, search for "data parser" in the documentation. If you have a COBOL 01 copybook file with which to define your fields, use Binary as your source connector.
Data Types
All data in Unicode files is Text, but you may wish to use other data types. The following data types are available:
Boolean (parses and displays true or false values)
Name (parses and displays proper name into name parts, such as honorifics, titles, last name, middle initial, first name)
Number (parses and displays numeric and floating values)
Text (parses and displays alphanumeric values)
Decimal (parses and displays a proper fraction whose denominator is a power of 10)
Note:  Use the Name data type if you want to parse a name into its component pieces. Some examples are honorifics (Mr., Dr.), names (first, middle, last) and titles (PhD., Jr.). You can also display the component pieces according to an edit mask. For example, you can set the mask to display a name in a field as "lastname, firstname" (Jones, Joan).