Inconsistent Databases and Recovery
An inconsistent database occurs when administrative changes to a database in the transaction log do not agree with information maintained in the database’s configuration file.
The main causes of inconsistent database errors are improper system administration procedures. These include:
• Initializing the transaction log file with the -force_init_log flag of rcpconfig while the log file still contains open transactions
• Moving or altering files or installation variables without using the appropriate utility such as ingbuild or unloaddb
• Improper procedures when recovering Vector data from operating system backups
Inconsistent database errors can also be caused by hardware problems or software problems. For example, inconsistent database errors can be caused when the transaction log file or the configuration file has been corrupted by a hardware failure. Inconsistent database errors can also be caused by software bugs. In either case, contact technical support.
Automatic Recovery
Vector automatically handles the transaction failures that cause most database inconsistencies.
Recovery During Normal Operation
If a user program exits or a transaction is aborted for some other reason, the DBMS Server automatically handles transaction rollback. This does not cause an inconsistent database.
Recovery at Shutdown
At shutdown, all users must have exited their sessions; therefore, all transactions are committed. If users exited their sessions abnormally, the DBMS Server aborts any open transactions associated with the aborted sessions. Very long transactions take time to roll back and cause ingstop to seem to hang. The DBMS Server process cannot exit normally until it finishes recovering the aborted transactions.
If transactions are being rolled back on shutdown, allow the DBMS Server to finish this task before shutting down. If you do not, longer delays occur at startup time while the recovery process is performing rollback.
Recovery at Startup
If transactions have been aborted and were not recovered by a normal shutdown, upon restart the recovery process performs recovery. This occurs, for example, if:
• Processes are forcibly killed from the operating system
• The machine is rebooted
• Power to the system is interrupted
The recovery process performs the following steps upon startup:
1. Reads the transaction log file. If there has not been a normal shutdown, the recovery process detects that databases are inconsistent—that is, that Vector previously exited without completing all the transactions required for system and database consistency.
2. Proceeds through the transaction log file to back out uncommitted transactions and complete committed fast-commit transactions until the databases are again in a consistent state. While recovery is proceeding, no user interfaces can connect to a database.
Recovery actions are logged in the file II_SYSTEM/ingres/files/iircp.log.
Recovery Process Monitoring
If you are monitoring Vector startup after a machine reboot, the following messages are displayed:
Starting Vector Name Server...
Starting Vector Communications Server...
Starting Vector Recovery Process...
If the transaction log contained uncommitted transactions when the machine failure occurred, the startup script pauses while the recovery process recovers transactions from the transaction log file. No messages are printed to the screen.
If you are in doubt as to whether recovery is taking place during startup, or to monitor the recovery process, use the following procedure.
Display the recovery process log file by typing the following command at the operating system prompt:
Windows:
%II_SYSTEM%\ingres\files\iircp.log
Linux:
tail -f iircp.log
If the system is recovering, the recovery actions are logged to the IIRCP.LOG file. This indicates that Vector is automatically recovering from possible inconsistencies.
Messages are printed to the log file during recovery:
• The message at the beginning of recovering transactions indicates that transaction recovery has begun.
• Intermediate messages track recovery progress. As recovery proceeds, progress messages (for example, “Recovered 31 of 130 transactions”) are displayed.
• When done, the following message is printed:
Recovery complete.
You can also use an operating system command to determine whether the recovery process is recovering transactions by checking to see if it is accumulating CPU time.
Linux:
On Linux, you can also monitor the files in the database directory of the database you suspect of being the target of the updates that are being backed out. The following command entered at the operating system prompt shows whether data files are being updated:
ls -lt
The file most recently updated is listed first along with the time of last update.
If any of the monitoring techniques above indicate that transaction recovery is taking place, continue to monitor the recovery process until recovery has completed. When the recovery process is complete, CPU time is not accumulated.
After the recovery process has finished, restart the installation with the ingstart command. The ingstart utility first shuts down and brings up all required installation processes. Programs can connect to the databases.
Inconsistent Database
If you receive an “inconsistent database” error after recovery is complete, it means updates and modifications were not properly completed or rolled back, and the database is therefore in an inconsistent state.
Following are examples of “inconsistent database” errors that indicate your database has become inconsistent:
E_DM0100 DB_INCONSISTENT Database is inconsistent
E_US0026 Database is inconsistent. Please contact the ingres system manager
E_DM9327 BAD_OPEN_COUNT
Diagnose an Inconsistent Database
Diagnose the cause and extent of an inconsistent database problem before you attempt to recover your database. Knowing the cause of the problem is essential to choosing the proper recovery procedures. Once a database has been rolled forward from a checkpoint, recovered from an operating system backup, or forced consistent, you cannot determine the cause of inconsistency.
To diagnose the cause and extent of an inconsistent database problem:
1. Read and save the full text of the error messages in errlog.log and iircp.log.
2. Run the infodb command (from the operating system prompt or from the Database menu in Actian Director or VDBA) to read the database's configuration file and identify the cause of inconsistency.
If the configuration file can be opened and read, the cause of the inconsistency is displayed. Save the output of infodb for technical support.
If the database's configuration file, "aaaaaaaa.cnf", cannot be read, it is corrupted. You need to recover from a backup, as described in
Recover an Inconsistent Database.
3. Review the history of your Vector installation. Look for improper system administration procedures that have caused the database to become inconsistent. See the table in
Common Causes of Inconsistent Databases.
4. Report your problem to technical support. If inconsistent database was not caused by incorrect system administration procedures, hardware failure, or known operating system software bugs, record the information, as discussed in
What You Need Before Contacting Actian Support.
Common Causes of Inconsistent Databases
Common causes of inconsistent databases are:
• Operating system backups
• Incorrect installation paths
• Disabling of logging/recovery system
• Use of unsupported hardware configuration
Inconsistencies Due to Operating System Backups
To recover a database from an operating system backup that was made while the installation was running, see
Recover an Inconsistent Database.
Inconsistencies Due to Incorrect Installation Paths
Changing Vector installation variables (such as II_SYSTEM, II_DATABASE, II_CHECKPOINT, II_JOURNAL, II_DUMP, or II_WORK) without using proper procedures, causes inconsistency between the information stored in the installation variables and those stored in the database configuration file “aaaaaaaa.cnf”.
Database inconsistency can occur if you move a database, table, application or some other object by using operating system commands rather than the supported Vector utilities. If the inconsistency is the result of moving a database from another location or installation without using unloaddb, you must remove the database using destroydb, recreate the database using createdb, and repopulate the database using the unloaddb utility.
A database file can become corrupted from hardware or software failures of various kinds. A data file can be inadvertently deleted by hand, but this is rare because only the user who owns the installation can write to the database directories.
If you are in doubt about whether transactions are being recovered, run the logstat utility and examine the “Status” field. It is marked RECOVER if in the recovery state. While a recovery is taking place, for example when restarting after a system failure, the recovery process requires time to read through the transaction log file to back out uncommitted transactions and complete fast-commit transactions. To users, the system appears to hang.
Examine Configuration File of a Database
To examine the configuration file of your database, enter the infodb command at the command prompt. You can also use the Database menu in Actian Director or VDBA.
1. Compare the path information for the checkpoint, journal, data and dump locations with that defined for these environment variables as displayed by the following command:
Windows:
ingprenv
Linux:
ingprenv
2. Return the installation variables to the values displayed to by infodb, if the values have changed. If these values are not the same, the installation variables have been changed, or the database has been imported from some other Vector installation.
3. If you need to change the existing values of Vector installation variables or import a database from another site, you must use the unloaddb utility, as this creates a new, up-to-date configuration file for the database. For a discussion of Vector environment variable that cause an inconsistent database if changed after installation is completed, see the chapter “Environment Variables.”
Recovery Rules
The following are rules that you should keep in mind about the recovery of transactions:
• It takes at least as long to recover aborted transactions as it took to execute them originally.
• The amount of time required for recovery depends on the number of users and transactions, transaction semantics (whether autocommit is set), and the consistency point interval.
• While recovery is proceeding, all users are denied access to databases. Any attempt to connect to a database at this point returns an error such as the following:
E_LQ0001_STARTUP gca protocol service request failure.
• Database inconsistency can occur if a user or the system administrator attempts to “force” entry into the installation by running rcpconfig with the -force_init_log flag (thus erasing the transaction log file) before the recovery system has finished rolling back the uncommitted transactions during recovery.
After a system failure, monitor recovery and always allow it to proceed until the “Recovery Complete” message appears in iircp.log.
Inconsistencies Due to Disabling of Logging or Recovery System
By disabling the logging or recovery system, the DBA can temporarily turn off logging for the database to speed bulk loading of data. If logging has been turned off for this database, an error message appears in the error log file. Typically this message is:
E_DM9060_SESSION_LOGGING_OFF Database dbname has been updated by a session running with SET NOLOGGING defined.
If the database has become inconsistent, you can check for this error message by typing the following command:
Windows:
findstr DM9060\ %II_SYSTEM%\ingres\files\errlog.log
Linux:
grep DM9060 \$II_SYSTEM/ingres/files/errlog.log
If the SESSION_LOGGING_OFF error message appears, logging was disabled on this database. If the message in the error log was written later than the most recent checkpoint of this database, the database must be restored from the checkpoint. To determine if this is the case, compare the timestamp on the error message in errlog.log with the timestamp in the “checkpoint history” field of the output from the command infodb dbname.
For details on SET NOLOGGING, see the User Guide.
Database Inconsistencies Due to Use of Unsupported Hardware Configurations in Linux
Database inconsistencies can be caused by the use of unsupported hardware configurations on NFS. In systems that include Network File System (NFS) mounts, be aware that Vector:
• Supports NFS client installation configurations in which the DBMS Server process and data directories on one node are accessed by application programs executing on another.
• Does not support running DBMS servers on one node and accessing data directories on another network node using NFS. The configuration can cause undetected write errors that lead to database inconsistency.
To check your configuration, type mount at the operating system prompt. Make sure that the data directories (II_DATABASE, II_CHECKPOINT, II_JOURNAL and II_LOG_FILE) are not NFS-mounted from a remote node.
Recover an Inconsistent Database
The recommended method of recovering an inconsistent database is to use the Restore operation in Actian Director or rollforwarddb operation in VDBA. You can also enter the rollforwarddb command from the command line.
If no Vector checkpoint exists, you can recover from an operating system backup.
Make Inconsistent Database Consistent
The recommended way to make your inconsistent database consistent is to use rollforwarddb. It recovers the database from a previous checkpoint and, if journaling was enabled, applies the associated journals. For the full procedure, see the User Guide.
Use of Operating System Backup
We do not support nor recommend the use of operating system backups as your primary means of ensuring database recoverability. The Vector checkpoint and journaling programs provide the secure way to ensure that your data is recoverable.
IMPORTANT! Operating system backups must be used only as a last resort, when Vector checkpoints have been lost or destroyed, and only under the direction of technical support.
When No Backup Exists
If you have an inconsistent database for which no checkpoints or operating system backups exist, you can still gain access to that database and attempt to salvage the data using the verifydb utility.
The verifydb utility can be used to unset the “inconsistent database” flag in the configuration file “aaaaaaaa.cnf”. This permits access to the database; it does not, however, make the data consistent. If verifydb is used to force access to a database that is inconsistent, the state of the database remains unknown. Such a database becomes unsupportable by technical support. Data can be lost and problems occur weeks or months later. Technical support cannot diagnose the state of such a database because the built-in consistency checks have been overridden.
The format “verifydb -oforce_consistent” does not recover a database. It merely allows access and continued operation to a database that is in an inconsistent state.
Gain Emergency Access to an Inconsistent Database Using verifydb
If you must use verifydb to gain emergency access to data in an inconsistent database, do so as follows:
2. Back up the database directory at the operating system level.
3. Run verifydb in report mode by typing the following command at the operating system prompt:
verifydb -mreport -sdbname "dbname" -odbms_catalogs -u$ingres
Verifydb output is logged in II_SYSTEM/ingres/files/iivdb.log:
4. To repair inconsistencies in the system catalogs interactively and force the database consistency flag, type the following commands at the operating system prompt:
verifydb -mruninteractive -sdbname "dbname" -oforce_consistent -u$ingres
verifydb -mruninteractive -sdbname "dbname" -odbms_catalogs -u$ingres
5. Call Technical Support if additional assistance is required to resolve the inconsistency.