Importing UTF-8 Applications and Components

OpenROAD 6.2 allows you to operate the runtime as with previous versions, dealing only with single-byte ASCII- or extended ASCII-encoded exports to disk. However, with OpenROAD 6.2, you can enable the runtime to operate in a Unicode environment importing applications and components that are stored as valid UTF-8-encoded Unicode code points.

This section assumes you are using OpenROAD 6.2 runtime in a Unicode-aware environment. Existing source applications and components that are encoded in extended ASCII must be converted into UTF-8.

In versions of OpenROAD before 5.1, source applications and components were always stored on disk encoded as ASCII and extended ASCII characters. Primarily these exports were 7-bit clean ASCII characters in the range 0x00 to 0x7F but could also contain extended ASCII characters in the range 0x80 to 0xFF. OpenROAD 6.2 runtime now can handle Unicode code points.

Applications and components exported from earlier versions of OpenROAD that are composed entirely of 7-bit ASCII in the proprietary format, can be imported directly into OpenROAD 6.2 as valid UTF-8 encodings of Unicode code points. (The first 127 code points match the ASCII representation of the characters.) However, components that contain extended ASCII bytes between 0x80 and 0xFF cannot be imported successfully from the proprietary exports from any version of OpenROAD. If you are able to import these components into OpenROAD 5.1 and use the XML export feature to export source application and components, OpenROAD runtime will encode your source applications and components as UTF-8 encodings of Unicode code points. These XML applications can then be imported into OpenROAD 6.2, even if the original source contains characters that would be represented by extended ASCII characters in proprietary exports. This is possible because the XML-based formats are stored on disk as UTF8-encoded Unicode code points.

For example, during application or component import, if OpenROAD 6.2 encounters what is intended to be an extended ASCII character in the file, it is treated as if it is part of a multi-byte UTF-8 encoding. For instance, the file being imported could contain a byte sequence such as 0xE6 0x95 0x95. The three characters would be displayed as “æ••” if treated as extended ASCII. The same three bytes would be interpreted as one character in a UTF-8 encoding. This multi-byte sequence would represent a Unicode Han character (U+6555), displayed as “敕”. Although all three bytes are valid extended ASCII or UTF-8 encodings of Unicode, their proper representation in OpenROAD 6.2 depends on their original purpose and use. Importing such a file in the proprietary format, versions of OpenROAD prior to 6.2 assume the content represents extended ASCII. Using OpenROAD 6.2 as a Unicode runtime assumes that the content represents UTF-8-encoded Unicode code points. If a file actually does contain extended ASCII and does not contain UTF-8 encoded code points, importation will fail. In the unlikely case that the application imported successfully into OpenROAD 6.2, it would display invalid text, not work as expected, not compile, or issue a runtime error.

When importing applications or components, OpenROAD 6.2 checks to see whether the file is a valid UTF-8-encoded file (in which 7-bit clean ASCII with no extended ASCII characters also happens to be a valid UTF-8-encoded file with the same semantic meaning). If the file contains extended ASCII characters or has become corrupted, it cannot be imported to OpenROAD 6.2 with the same semantic meaning. In most cases, OpenROAD will display the error message E_DO0090, indicating that an invalid UTF-8 encoding was detected and that either:

During the import process, if you receive error E_DO0900, you must determine whether the file is comprised entirely of 7-bit clean ASCII characters in the range 0x00 to 0x7F, which is a valid UTF-8 encoding if the source is a proprietary version of an OpenROAD export. You must ensure that the file is encoded so that it contains only valid UTF-8 encodings and then import the file again.

Note: The best way to ensure UTF-8 compliance is by exporting the component or application to XML format using the XML export feature. This will ensure that application and component export files can be reimported cleanly into OpenROAD environments that support Transparent Unicode.