Determining Which Connector to Use

When working with text files with special encoding characters, use a Unicode connector instead of an ASCII connector. Even if you do not have Unicode data, you have the option of setting the encoding property in the Unicode connectors. The encoding property options include non-Unicode and Unicode formats.

Setting the correct character encoding is important. If you are not sure which encoding to use, select UTF-8. It is the most common Unicode scheme and handles most issues. If needed, you can also use ASCII encoding.

When connecting a Unicode source data file with the Unicode connector, the encoding scheme should automatically change from OEM to the proper character set after the source file is specified. For instance, your file may specify UCS-2 when used as a source file. However, if the file does not having leading byte characters as described below, you may have to set the encoding scheme (to UCS-2 or UTF-16, for example).

When outputting to a Unicode target data file with the Unicode connector, the encoding scheme defaults to OEM. You should change the encoding to reflect the correct Unicode character set such as UTF-8 or UTF-16. Note that choosing UCS-2 in the Map window provides the same results as choosing UTF-16. For the purposes of data exchange, UTF-16 and UCS-2 are the same. Both are 16-bit and have the same code unit representation.

Choosing OEM can result in an output file containing replacement characters, such as "?" or "." in place of characters that were not transformed to the OEM code page.

To fix the issue, change the encoding to the correct Unicode character set of your target file.

If some characters still fail to display correctly, do the following:

• Consider using Windows Notepad or a hex viewer to view the data directly. This helps you to distinguish between transformation and display problems.

The XML connector encoding is determined by the encoding property in the XML file.

Some database and database server installations require that Unicode properties be set during initial installation. Subsequent databases that are created are sometimes limited to the Unicode or encoding properties that may not have been properly configured during the server install. For this reason, it is important to read the software manufacturer information for how to set up Unicode database server installations and subsequent databases.

• SQL Server connector - Set the encoding to UTF-8 and use the Unicode data types, such as nchar, nvarchar, unichar, univarchar, nclob, and nvarchar2.

Languages that use Unicode are automatically handled if you have the Java Development Kit (JDK) installed on your system. The integration platform installs the Java Runtime Environment (JRE), so in order to work with components you must choose regional language options during the JDK installation.

If you have Unicode data in Oracle tables, we recommend using the ODBC 3.5 connector to connect. However, if all of your Unicode data has code points in the code page in use where the integration system is running, then either the ODBC 3.5 or the appropriate Oracle connector may be used.

In most cases, set the Encoding property to UTF-8 or UTF-16.

The default environment setting in Oracle is defined as the following:

Oracle NLS_LANG = language_territory.charset AMERICAN_AMERICA. WE8ISO8859P1 AMERICAN_AMERICA. UTF8

In the Oracle database, change the language and formatting settings for your tables as needed.

For instance, for the Japanese language, do the following:

Oracle NLS_LANG = language_territory.charsetJapanese_Japan.JA16SJISJapanese_Japan. UTF16

A company is using the integration platform to move data from a delimited ASCII format to an Oracle database. The company is expanding its markets to include East Asian languages. The delimited text documents are mainly in UTF-8 format and occasionally in Japanese EUC-JP and Chinese GB2312 encoding.

• Install the East Asian Language package and select Install files for East Asian Languages.

The following are a few examples of the options you can set in Oracle.

Language Name	Language Type	Option
Simplified Chinese	LANG=zh_CN.GB2312	LC_ALL=zh_CN.GB2312
Traditional Chinese	LANG=zh_TW.BIG5	LC_ALL=zh_TW.BIG5
Japanese	LANG=ja_JP.eucJP	LC_ALL=ja_JP.eucJP
Korean	LANG=ko_KR.eucKR	LC_ALL=ko_KR.eucKR