URI Connections
Uniform Resource Identifier (URI) is an addressing technology from which a Uniform Resource Locator (URL) is created. Types of URIs include URLs that use the http:// and ftp:// protocols.
The complete URI describes the mechanism used to access the resource, the computer name where the resource is located, and the name of the resource (a file name) on the computer. The letters to the left of the colon indicates the access scheme (protocol). The format of the alphanumeric information to the right of the colon depends on the scheme and contains the host address and the path to the file you want to access. The port number that serves the protocol follows a second colon to the right of the host name. If the port number is not inserted, the access protocol attempts a connection using the default port for the specified protocol. For example, http service uses the default port 80. Therefore, http://hostname:80/filename and http://hostname/filename are valid.
URI Support In Actian DataConnect
All file based connectors, scripting functions, and object references are supported.
The following URI formats are supported in the Map Editor.
• http://[address]
• https://[address]
• file:///[path to file]
• ftp://[server]/[path to file]
• djmessage:///[name of message object]
• gzip:///[path to gzip file]
The following formats are also supported:
• djstream:///[path to dll]
• stdin:///
• stdout:///
• stderr:///
• anythingelse:///
Do not include brackets [ ] when specifying the URI scheme in Source File/URI (Sources tab) or Target File/URI (Targets tab).
Internet URL structure uses two slashes (//) to the right of the scheme (protocol) and colon. DJObject resource locator structure uses three slashes (///). The scheme determines the number of slashes that follows the colon.
gzip is a special compressed file format that will be automatically decompressed and the contents used within the map.
Note: Dynamic content is not supported. You may have to save the web page to disk before parsing it.
Internet URL Format
The following is a valid internet URL format:
scheme://[username:password@]domainname:portnumber/path
where:
• scheme: Access protocol (service) such as HTTP or FTP.
• :(colon) - Informs the scheme that everything that follows the colon is the host name in RFC1037 format until the third slash (/).
• username:password@: User name and password are optional.
• domainname: website name or address where Internet users can access your website.
• :portnumber: If port is not specified, then scheme uses the default port and the second colon is not used.
• /path: The third slash informs the scheme that everything that follows the slash is the hierarchical path to the resource, including file name.
Note: Within the URL, the case and file extension must be correct. If the URL returns "Error 404 Resource Not Found" or another form of this error, verify the case and file extension and retry. For example:
The URL http://www.domainname.com/file.htm may not locate http://www.domainname.com/File.html and returns "Error 404-Resource Not Found" or defaults to the domain error page.
URI Parameters
The following table provides the description for the URI parameters.
Escape Characters
Valid URIs require character strings that are consistent with the established standards. Some URIs may contain reserved (non-valid) characters such as a space or a hash. These non-valid characters must be escaped for the URI to be valid.
Whether a character is reserved or not is defined by the URI component it appears in and the syntax for escaping is dependent on the context. Due to this ambiguous nature, use the reserved or potentially reserved characters with caution. If possible, avoid using these characters. If you cannot avoid it, then it is important to understand the characters that are reserved, the context in
which they are reserved, and how to escape the reserved characters. For example, http://server/directory/file 3 changes to
http://server/directory/file%20. The space character is replaced with the percent sign and the ASCII hex value 20.
Note: The hash ("#", ASCII 23 hex) character is reserved. It is used as a delimiter to separate the URI of an object from a fragment identifier. If the hash character is used as a valid character in a URI, it must be escaped.
For more information, see Uniform Resource Identifiers (URI) at
http://www.ietf.org.Non-routable Addresses
Some URI groups are reserved for specific purposes such as private networks, where it is not required to expose an individual computer or group of computers directly to the web. The following three groups of addresses are reserved for this:
• 10.0.0.0 - 10.255.255.255
• 172.16.0.0 - 172.31.255.255
• 192.168.0.0 - 192.168.255.255
The non-routable IP addresses are used, network routers within the organization route the traffic with Native Address Translation (NAT) using Dynamic Host Control Protocol (DHCP). Because the non-routable IP groups are not registered in the Internet router and domain name server tables, the local router assigns identifying code to the packet headers it receives from local machines. The local router uses these identifiers to direct traffic from and to local computers. Non-routable addresses provide an additional layer of security.
Limitations
• Map Designer can connect to sources that can be represented as a file or file stream such as WHOIS and HTTP. It cannot connect to interactive session resources such as TELNET or RLOGIN.
• You can send data to targets using HTTP and FTP protocols. However, the URI structure is different. For example:
ftp://username:password@server.domain.com/subdirectory/file.asc
• Sometimes Map Editor times out before connecting due to slow network. To resolve this, add the following line in the [UserInfo] section in the cosmos.ini file:
NetworkTimeout=5
The normal default is 5 seconds. If there are problems connecting, then increase the network timeout value to a number greater than 5.