User Guide : Map Connectors : Encoding Reference : Determining Which Connector to Use
 
Share this page                  
Determining Which Connector to Use
This section provides information about determining the type of connector to use.
Flat File Connectors
Unicode or ASCII
When working with text files with special encoding characters, use a Unicode connector instead of an ASCII connector. Even if you do not have Unicode data, you have the option of setting the encoding property in the Unicode connectors. The encoding property options include non-Unicode and Unicode formats.
Character Encoding
Setting the correct character encoding is important. If you are not sure which encoding to use, select UTF-8. It is the most common Unicode scheme and handles most issues. If needed, you can also use ASCII encoding.
Unicode Source Data
When connecting a Unicode source data file with the Unicode connector, the encoding scheme should automatically change from OEM to the proper character set after the source file is specified. For instance, your file may specify UCS-2 when used as a source file. However, if the file does not having leading byte characters as described below, you may have to set the encoding scheme (to UCS-2 or UTF-16, for example).
Unicode Target Data
When outputting to a Unicode target data file with the Unicode connector, the encoding scheme defaults to OEM. You should change the encoding to reflect the correct Unicode character set such as UTF-8 or UTF-16. Note that choosing UCS-2 in the Map window provides the same results as choosing UTF-16. For the purposes of data exchange, UTF-16 and UCS-2 are the same. Both are 16-bit and have the same code unit representation.
Troubleshooting the Replacement Character Issue
Choosing OEM can result in an output file containing replacement characters, such as "?" or "." in place of characters that were not transformed to the OEM code page.
To fix the issue, change the encoding to the correct Unicode character set of your target file.
If some characters still fail to display correctly, do the following:
Check that the font and script are set correctly.
Consider using Windows Notepad or a hex viewer to view the data directly. This helps you to distinguish between transformation and display problems.
XML Connector
The XML connector encoding is determined by the encoding property in the XML file.
Database Connectors
Some database and database server installations require that Unicode properties be set during initial installation. Subsequent databases that are created are sometimes limited to the Unicode or encoding properties that may not have been properly configured during the server install. For this reason, it is important to read the software manufacturer information for how to set up Unicode database server installations and subsequent databases.
ODBC driver support - Use an ODBC driver that supports Unicode.
Oracle connector - See Encoding and Oracle Databases.
Sybase connector
SQL Server connector - Set the encoding to UTF-8 and use the Unicode data types, such as nchar, nvarchar, unichar, univarchar, nclob, and nvarchar2.
Components
Languages that use Unicode are automatically handled if you have the Java Development Kit (JDK) installed on your system. The integration platform installs the Java Runtime Environment (JRE), so in order to work with components you must choose regional language options during the JDK installation.
Encoding and Oracle Databases
If you have Unicode data in Oracle tables, we recommend using the ODBC 3.5 connector to connect. However, if all of your Unicode data has code points in the code page in use where the integration system is running, then either the ODBC 3.5 or the appropriate Oracle connector may be used.
In most cases, set the Encoding property to UTF-8 or UTF-16.
The default environment setting in Oracle is defined as the following:
Oracle NLS_LANG = language_territory.charset AMERICAN_AMERICA. WE8ISO8859P1 AMERICAN_AMERICA. UTF8
To change the locale settings in Oracle
In the Oracle database, change the language and formatting settings for your tables as needed.
For instance, for the Japanese language, do the following:
Oracle NLS_LANG = language_territory.charsetJapanese_Japan.JA16SJISJapanese_Japan. UTF16
Example Use Case
A company is using the integration platform to move data from a delimited ASCII format to an Oracle database. The company is expanding its markets to include East Asian languages. The delimited text documents are mainly in UTF-8 format and occasionally in Japanese EUC-JP and Chinese GB2312 encoding.
The following is the suggested approach that the company take:
Source: Unicode (Delimited)
Target: ODBC 3.5 (Oracle)
Install the East Asian Language package and select Install files for East Asian Languages.
The following are a few examples of the options you can set in Oracle.
Language Name
Language Type
Option
Simplified Chinese
LANG=zh_CN.GB2312
LC_ALL=zh_CN.GB2312
Traditional Chinese
LANG=zh_TW.BIG5
LC_ALL=zh_TW.BIG5
Japanese
LANG=ja_JP.eucJP
LC_ALL=ja_JP.eucJP
Korean
LANG=ko_KR.eucKR
LC_ALL=ko_KR.eucKR
For more details on how Oracle supports encoding, refer to the Oracle documentation.