User Guide : Map Connectors : Types of Connections : URI Connections
 
Share this page             
URI Connections
Uniform Resource Identifier (URI) is an addressing technology from which a Uniform Resource Locator (URL) is created. Types of URIs include URLs that use the http:// and ftp:// protocols.
The complete URI describes the mechanism used to access the resource, the computer name where the resource is located, and the name of the resource (a file name) on the computer. The letters to the left of the colon indicates the access scheme (protocol). The format of the alphanumeric information to the right of the colon depends on the scheme and contains the host address and the path to the file you want to access. The port number that serves the protocol follows a second colon to the right of the host name. If the port number is not inserted, the access protocol attempts a connection using the default port for the specified protocol. For example, http service uses the default port 80. Therefore, http://hostname:80/filename and http://hostname/filename are valid.
URI Support In DataConnect
All file based connectors, scripting functions, and object references are supported.
The following URI formats are supported in the Map Editor.
http://[address]
https://[address]
file:///[path to file]
ftp://[server]/[path to file]
djmessage:///[name of message object]
gzip:///[path to gzip file]
The following formats are also supported:
djstream:///[path to dll]
stdin:///
stdout:///
stderr:///
anythingelse:///
Do not include brackets [ ] when specifying the URI scheme in Source File/URI (Sources tab) or Target File/URI (Targets tab).
Internet URL structure uses two slashes (//) to the right of the scheme (protocol) and colon. DJObject resource locator structure uses three slashes (///). The scheme determines the number of slashes that follows the colon.
gzip is a special compressed file format that will be automatically decompressed and the contents used within the map.
Note:  Dynamic content is not supported. You may have to save the web page to disk before parsing it.
Internet URL Format
The following is a valid internet URL format:
scheme://[username:password@]domainname:portnumber/path
where:
scheme: Access protocol (service) such as HTTP or FTP.
:(colon) - Informs the scheme that everything that follows the colon is the host name in RFC1037 format until the third slash (/).
username:password@: User name and password are optional.
domainname: website name or address where Internet users can access your website.
:portnumber: If port is not specified, then scheme uses the default port and the second colon is not used.
/path: The third slash informs the scheme that everything that follows the slash is the hierarchical path to the resource, including file name.
Note:  Within the URL, the case and file extension must be correct. If the URL returns "Error 404 Resource Not Found" or another form of this error, verify the case and file extension and retry. For example:
The URL http://www.domainname.com/file.htm may not locate http://www.domainname.com/File.html and returns "Error 404-Resource Not Found" or defaults to the domain error page.
URI Parameters
The following table provides the description for the URI parameters.
Parameter
Description
User
User name is optional. Some schemes (for example, FTP) allows to specify user name.
Password
Password is optional. User name is required if password is used. Password follows the user name separated by a colon. The user name and password is followed by a @ sign. For example:
ftp://@host.com/ - has an empty user name and no password
ftp://host.com/ - has no user name
ftp://user:@host.com/ - has a user name called "user" and an empty password.
Within the user name and password field, any ":", "@", or "/" must be escaped.
Host
Fully qualified domain names are a sequence of domain labels separated by decimals. Each domain label starts and ends with an alphanumerical character and may also contain "-" characters. The last domain label does not start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.
Port
Port number required to establish the connection. The default port number is 80. Most schemes use protocols that have a default port number. To specify a different port, the port number follows host name and is separated from hostname by a colon. For example,
http://somedomain.com:110 means access this host on port 110 instead of the default HTTP port 80.
URL Path
The remaining part consists of data specific to the scheme and is known as the URL path. It provides details of how the specified resource is accessed. The "/" between the host (or port) and the URL path is not part of the URL path. The URL path syntax and how it is interpreted depends on the scheme that is used. For example:
http://somedomain.com/private/myweb/vacation/pix/index.html
In this example, the host server is somedomain.com. The rest of the URI is the URL path.
Escape Characters
Valid URIs require character strings that are consistent with the established standards. Some URIs may contain reserved (non-valid) characters such as a space or a hash. These non-valid characters must be escaped for the URI to be valid.
Whether a character is reserved or not is defined by the URI component it appears in and the syntax for escaping is dependent on the context. Due to this ambiguous nature, use the reserved or potentially reserved characters with caution. If possible, avoid using these characters. If you cannot avoid it, then it is important to understand the characters that are reserved, the context in which they are reserved, and how to escape the reserved characters. For example, http://server/directory/file 3 changes to http://server/directory/file%20. The space character is replaced with the percent sign and the ASCII hex value 20.
Note:  The hash ("#", ASCII 23 hex) character is reserved. It is used as a delimiter to separate the URI of an object from a fragment identifier. If the hash character is used as a valid character in a URI, it must be escaped.
For more information, see Uniform Resource Identifiers (URI) at http://www.ietf.org.
Non-routable Addresses
Some URI groups are reserved for specific purposes such as private networks, where it is not required to expose an individual computer or group of computers directly to the web. The following three groups of addresses are reserved for this:
10.0.0.0 - 10.255.255.255
172.16.0.0 - 172.31.255.255
192.168.0.0 - 192.168.255.255
The non-routable IP addresses are used, network routers within the organization route the traffic with Native Address Translation (NAT) using Dynamic Host Control Protocol (DHCP). Because the non-routable IP groups are not registered in the Internet router and domain name server tables, the local router assigns identifying code to the packet headers it receives from local machines. The local router uses these identifiers to direct traffic from and to local computers. Non-routable addresses provide an additional layer of security.
Limitations
Map Designer can connect to sources that can be represented as a file or file stream such as WHOIS and HTTP. It cannot connect to interactive session resources such as TELNET or RLOGIN.
You can send data to targets using HTTP and FTP protocols. However, the URI structure is different. For example:
ftp://username:password@server.domain.com/subdirectory/file.asc
Sometimes Map Editor times out before connecting due to slow network. To resolve this, add the following line in the [UserInfo] section in the cosmos.ini file:
NetworkTimeout=5
The normal default is 5 seconds. If there are problems connecting, then increase the network timeout value to a number greater than 5.