BLOB Column Handling
Large Data Objects
Long, variable length data can be stored in columns of type LONG BYTE, LONG VARCHAR, and LONG NVARCHAR. Columns of these types are collectively referred to as Large Objects (LOB) and are further distinguished as binary (BLOB), character (CLOB), and National Character Set (NCS or Unicode – NLOB). Handling values of these types is somewhat different than smaller, fixed length types such as integers and strings and, depending on the representation, can place restrictions on how the data is retrieved and used, and impact the performance of an application.
The JDBC Driver can represent LOB values in three different ways:
• As a data stream within the application
• As a reference (called a LOCATOR) to the value stored in the database
• As a cached value
When first introduced, LOB values were only represented as streams of bytes or characters. Ingres 9.2 introduces the capability of retrieving a LOCATOR reference to a LOB value and accessing the value as it resides in the database through the reference. In addition, the JDBC Driver provides the capability of loading a LOB data stream or LOCATOR reference and caching the LOB data in the driver.
These three representations, how they are manifested in JDBC, and the impact they have on an application are discussed here.
LOB Data Streams
A LOB data stream value accompanies the other values in a set of parameters or columns. LOB data streams are serialized in order with the other values they accompany and must be processed entirely before accessing the values that follow. LOB data streams can be accessed only once per value. The driver declares LOB values, when represented as data streams, to be of type LONGVARBINARY or LONGVARCHAR.
A LOB data stream must be accessed and read completely prior to accessing any value that follows the LOB in a result set. When a value is accessed that follows an unaccessed or partially accessed LOB data stream, the driver must read and discard the remaining LOB data so that the requested value can be accessed. If an attempt is subsequently made to access the discarded LOB value, an SQLException is generated indicating that the LOB data is no longer accessible.
LOB data streams must also be read fully before making any further request on the associated connection. Because data from the DBMS Server is serialized on the connection, the results from additional requests on the connection are queued behind any unread LOB data. The JDBC Driver avoids conflicts resulting from multiple simultaneous requests on a connection by locking the connection for the duration of each request.
When a LOB data stream value is present in a result set, the connection is not unlocked until all the data in the row, including the LOB data, has been read. An attempt to make an additional request on a connection when a LOB column has not been read completely generates an SQLException indicating that a request was made before the prior request had completed.
LOB data streams can be accessed only once. Because LOB data streams are not cached, only one call (to getString(), getCharacterStream(), and so on) can be made for each LOB value in each row of the result set. Additional requests to access a LOB data stream value generate an SQLException indicating that the LOB data is no longer available.
In general, the following recommendation from the Sun JDBC documentation must be followed: “For maximum portability, columns within a row must be read in left-to-right order, and each column must only be read once. This reflects implementation limitations in some underlying database protocols.”
A result set containing a LOB data stream value is not able to perform READONLY cursor prefetching. Only one row of a result set is retrieved with each DBMS Server access when a LOB data stream is present. While this does not directly affect the JDBC application, row fetch performance is reduced when a result set contains a LOB data stream value.
The restrictions associated with LOB data streams can be avoided by configuring the driver to cache LOB data streams when received from the DBMS. When enabled, the driver reads LOB data streams as they are received and stores them in memory. All row column values are fully loaded when control is returned to the application.
An uncached LOB data stream can also be cached by accessing it using the getBlob() or getClob() method. Calling one of these methods satisfies the restrictions associated with LOB data streams and allows extended access to the LOB data using the Blob/Clob interface.
For further details on caching LOB data streams, see Cached LOB Values below.
LOB Locators
A LOB Locator is a reference to a LOB value stored in a database. Locators reduce the overhead of retrieving the entire LOB data value during row processing. Applications can use a Locator reference to retrieve the LOB data when and if it is determined that the data is needed. A Locator reference can also be used to perform certain operations on the LOB data while it resides in the database. The driver declares LOB Locator values to be of type BLOB or CLOB and wraps Locator values in objects which implement the JDBC Blob and Clob interfaces.
By default, the driver utilizes LOB Locators when supported by the DBMS. The driver can be configured to use LOB data streams instead of Locators by setting the following system property:
ingres.jdbc.lob.locators.enabled=false
When select loops are enabled, the DBMS streams all result rows back to the driver. Row returning database procedures also stream result rows. While a result stream is active, no other DBMS request is permitted. The driver does not utilize LOB Locators by default when result row streams are active since they cannot be used to access LOB data until after the result set is closed. The driver can be configured to use LOB Locators with result row streams by setting the following system property (LOB Locators must be enabled in general for this property to take effect):
ingres.jdbc.lob.locators.select_loop.enabled=true
Locators are valid during the transaction in which they are produced. Autocommit imposes a number of restrictions, depending on the autocommit mode, on the use of LOB Locators.
DBMS Mode
Locators remain valid during autocommit. Since the DBMS only supports a single active cursor during autocommit, Locators cannot be used to access a LOB value while a result set is active.
Single-Cursor Mode
Using a Locator to access a LOB value will cause any active result set to be closed.
Multi-Cursor Mode
Locators can be used to access a LOB value while the associated result set is active. Since autocommit is being simulated with standard transactions, all associated LOB Locators become invalid and unusable when their associated result set is closed.
Due to these restrictions, the driver does not utilize LOB Locators by default when autocommit is enabled. The driver can be configured to utilize LOB Locators during autocommit by setting the following system property (LOB Locators must be enabled in general for this property to take effect):
ingres.jdbc.lob.locators.autocommit.enabled=true
LOB values can be accessed through a Locator using the JDBC Blob/Clob objects returned by the ResultSet methods getBlob() and getClob(). The Actian X DBMS supports using LOB Locators to determine the length of the LOB data, search the LOB data, and read portions of the LOB data or the entire LOB value.
LOB values can also be accessed by using the other getXXX() methods supported for LOB data streams. The LOB value is retrieved from the database and converted to the form appropriate for the particular access method.
The Actian X DBMS does not support modifying LOB values using Locators. The JDBC Driver supports the Blob/Clob modification methods by reading and caching the LOB data and performing the modification on the cached value.
Cached LOB Values
The JDBC Driver will cache a LOB value in three circumstances:
• Caching of LOB data streams has been enabled with the system property ingres.jdbc.lob.cache.enabled.
• The application calls getBlob() or getClob() for a LONGVARBINARY or LONGVARCHAR column.
• The application calls a Blob/Clob modification method on an object representing a LOB Locator.
The driver can be configured to automatically cache LOB data streams by setting the following system property:
ingres.jdbc.lob.cache.enabled=true
Automatically caching LOB values requires sufficient memory to hold all active LOB values. Memory resources may be severely impacted when caching is enabled. To reduce the impact of extremely large LOB values, the LOB cache stores values as a series of blocks or segments. The default size of a segment is 8192 bytes or characters. The segment size is a trade off between the number of segments needed to store a value, the size of memory blocks needed to hold a segment, and the amount of unused space in the last segment. The segment size can be configured using the following system property:
ingres.jdbc.lob.cache.segment_size=<size>
The driver provides compatibility between the LOB data stream and Locator representations by allowing the same getXXX() method calls for both types. The driver supports getBlob() and getClob() methods for LOB data streams by caching the LOB data in the Blob/Clob object.
The driver supports Blob/Clob methods that write or truncate LOB values by reading the LOB data from the database (if necessary) and caching the data in the Blob/Clob object. Modify operations therefore modify a copy of the LOB data stored in the driver. The modified data is not automatically propagated to the database. An application can write modified LOB data back to the database by updating the row holding the LOB and providing the Blob/Clob object holding the modified data as a parameter using the setBlob(), setClob(), updateBlob(), or updateClob() methods.