User Guide : Using Content Extraction Language : Language Syntax and Examples : ENCODING
 
Share this page             
ENCODING
A built-in variable that, when set to a numeric value between –4 and 51, specfies the character set used in the source file. The ENCODING built-in variable should be set in the BEGIN block.
Example
# sets the code page to Shift-JIS.
BEGIN {
ENCODING = 197;
 
Character Code Set
Description
Value
UTF16
16-bit Unicode
-4
UTF8
8-bit Unicode
-3
UCS2
Unicode encoding used by Java and Windows
-2
OEM
Default (Original Equipment Manufacturer)
0
ISO-8859-1
Latin 1 (West Europe)
1
CP1252
Latin 1/ Microsoft 1252
2
ISO-8859-2
Latin 2 (East Europe)
3
ISO-8859-3
Latin 3 (South Europe)
4
ISO-8859-4
Latin 4 (North Europe)
5
ISO-8859-5
Cyrillic
6
ISO-8859-6
Arabic
7
ISO-8859-7
Greek
8
ISO-8859-8
Hebrew
9
ISO-8859-9
Turkish
10
CP1250
Windows Eastern European
11
CP1251
Windows Cyrillic
12
CP1253
Windows ANSII
13
CP1254
Windows Greek
14
CP1255
Windows Turkish
15
CP1256
Windows Hebrew
16
CP1257
Windows Arabic
17
CP1258
Windows Baltic
18
SHIFT-JIS
Japanese Information Standard
19
WSHIFTJIS
Windows Shift-JIS codepage (932)
20
GB2312
Chinese, Simplifiec (936)
21
KSC5601
Windows Unified Hangul (949)
22
BIG5
Windows Taiwan (traditional) Chinese (950)
23
CP037
EBCDIC USA/Canada
24
CP500
EBCDIC International
25
CP875
EBCDIC Greek
26
CP1026
EBCDIC Turkish
27
CP437
DOS Latin US
28
CP737
DOS Greek
29
CP775
DOS Baltic Rim
30
CP850
DOS Latin 1
31
CP852
DOS Latin 2
32
CP855
DOS Cyrillic
33
CP857
DOS Turkish
34
CP860
DOS Portuguese
35
CP861
DOS Icelandic
36
CP862
DOS Hebrew
37
CP863
DOS Canadian French
38
CP864
DOS Arabic
39
CP865
DOS Nordic
40
CP866
DOS Cyrillic Russian
41
CP869
DOS Greek 2
42
CP874
DOS Thai
43
CP273
EBCDIC Germany, Austria
44
CP277
EBCDIC Norway, Denmark
45
CP278
EBCDIC Sweden, Finland
46
CP280
EBCDIC Italy
47
CP280
EBCDIC Italy
47
CP284
EBCDIC Spain, Latin America
48
CP285
EBCDIC United Kingdom
49
CP297
EBCDIC France
50
CP1051
Roman-8
51