8. Troubleshooting Vector

System Administrator Guide : 8. Troubleshooting Vector

Share this page

Troubleshooting Vector

Process of Troubleshooting

Troubleshooting is a process of defining and correcting a problem that occurs in an otherwise functioning Vector installation and includes narrowing down the problem to a well-defined point, identifying the cause of failure, and eliminating it. When troubleshooting, you:

1. Determine the nature of the problem

2. Isolate the problem to a defined area

3. Eliminate the cause, using the following techniques:

• Correcting user errors

• Changing the Vector installation environment

• Changing the user environment

• Changing the operating system environment

• Changing the system configuration

• Restarting the system if needed

Tools for Troubleshooting

You can perform system administrator functions—including configuration, performance monitoring, backup and recovery, and remote database optimization—using a variety of tools:

• Commands—For information, see the Command Reference.

• Actian Director—For instructions, see the Director help.

Note: Director does not contain the full set of tools found in VDBA.

• VDBA—For instructions see the VDBA online help.

Determine the Problem Area

The first step in the process of troubleshooting is to determine the problem area. The following troubleshooting flow chart shows the major problem categories:

Error Log Files

The error log files are located in the directory indicated by the Vector environment variable/logical II_CONFIG. The log files are as follows:

vectorwise.log

The error log for the X100 Engine

errlog.log

The DBMS error log and the default log for most programs.

iiacp.log

Archiver error log.

iircp.log

Recovery error log.

The names of optional log files can vary, but all log files end with the extension LOG. Optional log files include:

II_DBMS_LOG

DBMS error log

II_GC_LOG

GCC trace log

To display the value for II_CONFIG, type the following command at the operating system prompt:

ingprenv

View List of Log Files

For a list of log files, type the following command at the operating system prompt from the directory indicated by $II_CONFIG or II_LOG:

ls *.log

Check the Error Log Files

Checking for error logs is the first step for determining the nature of the problem. To check for error logs, follow these steps:

1. Check the log files for error messages. Examine errlog.log and vectorwise.log first.

2. If indicated, check optional log files for error messages.

3. Identify the errors associated with your problem.

All errors are time stamped. Find the most recent error message associated with your problem. Read back up the log file from there to the first error relating to that problem. Many error messages cascade from that initial error. This initial error is usually the most important in identifying the problem even though it is not the error that users report.

For example, the following error message merely notifies you that a DBMS server has exited for one of many possible reasons:

E_SC0221_SERVER_ERROR_MAX

You must look in the errlog.log for associated DBMS server errors such as “E_DM9300_DM0P_CLOSE_PAGE Buffer still fixed,” a fatal error message. Search the error log for additional errors occurring around the time of the error that was displayed.

Find Your Problem Category

To find the problem category, follow these steps:

1. Determine the general category in which your problem belongs. The following categories relating to running the Vector installation are described in this guide:

• Vector startup and shutdown

• Vector configuration

• Vector tools startup

• Inconsistent database or recovery

• Operating system performance

For a description of problems relating to queries and performance, see the User Guide.

2. Use the information in the error logs to determine which category to check. Error messages always include the first two letters of the facility code that generated the error.

For more information on the format of an error message, see the messages.readme file in the directory: $II_SYSTEM/ingres/files/english/messages.

Troubleshoot Startup, Shutdown, or Configuration Problems

Use the following flow chart to isolate a problem with startup, shutdown or configuration of your Vector installation:

Check Vector Installation

To check if the Vector installation is working fine, follow these steps:

1. Check that you are logged in as the installation owner by issuing the following command at the operating system prompt:

whoami

If the user ID of the installation owner is not shown, log off and log in again as this user.

2. Check that all users have II_SYSTEM set by issuing the following command at the operating system prompt:

echo $II_SYSTEM

/ usr/r6 (this varies by system)

All users must have Vector executables in their path variables. Check that everyone has the full search path to $II_SYSTEM/ingres/bin.

The installation owner must also include $II_SYSTEM/ingres/utility.

3. Check that each of the Vector installation variables has a valid value.

Vector environment variables are only used and “seen” by Vector and can be displayed with the following command entered at the operating system prompt:

ingprenv

If you are in doubt about the function or legal value of an environment variable, see the chapter "Setting Environment Variables" and the appendix "Environment Variables."

Vector environment variables denoting installation locations cannot be reset. To change these, you must rerun the installation program, ingbuild, and possibly unload and reload your database with unloaddb. More information is provided in Installation Locations.

4. Check the Vector environment variables that have been set locally, overriding the Vector installation-level definitions. Issue the following commands at the operating system prompt:

BSD:

printenv | grep II

printenv | grep ING

System V:

env | grep II

env | grep ING

Only a small category of Vector environment variables must be defined in the local user environment: those that permit you to access Vector, and those that define values that are different for your local environment. They include TERM_INGRES and ING_EDIT.

If you trace the problem to a Vector environment variable setting, correct the value. For details, see "Setting Environment Variables." If the installation does not start up, continue with this procedure.

5. Identify your installation code. If there is more than one Vector installation on this machine, type the following command at the operating system prompt. The installation code is used to distinguish which processes belong to which installation at sites with more than one Vector installation on the same machine:

ingprenv | grep II_INSTALLATION

The two-letter installation code is displayed (for example, the following code R6):

II_INSTALLATION=R6

Take note of your installation code: ______.

6. Check that all Vector processes are shut down. If there are processes that continue to run, see Check Shutdown Problems.

7. Restart Vector: Attempt once again to start up the installation by issuing the following command at the operating system prompt:

ingstart

8. If startup problems persist, continue the diagnostics described in Ingbuild on Linux or Detect Vector Startup Problems.

Ingbuild on Linux

The executable script ingbuild performs all the steps necessary to set up an installation. It checks system resources, installs shared memory and semaphores, configures DBMS server parameters, configures the logging and locking system, and starts all the required processes.

The ingbuild program is located in $II_SYSTEM/ingres/utility. It makes use of numerous shell commands as well as the following Vector binary and shell executables:

• createdb

• iilink

• ingstop

• ingstart

• ingprenv

• ingunset

• sql

One of the last things ingbuild does is call the ingstart script to start installation processes. When ingstart is called, it displays the message “Starting the Name Server process (iigcn).” If there are startup problems after this message has displayed, see Detect Vector Startup Problems.

Before you can diagnose a problem with ingbuild, you must identify which subroutine is failing. If you know which routine is failing and it is ingstart or one of the main installation processes (iigcn, iigcc, II_IUSV_nnn, dmfacp or iidbms), see the section below that addresses that executable.

Details on tracing are described in Bourne Shell -x Option.

Detect Vector Startup Problems

To diagnose Vector problems, use the following procedure.

1. Display which processes are running by using the csreport and operating system ps commands.

The csreport utility is described in Operating System Utilities and the ps command is described in Linux Operating System Utilities.

2. Verify that all required Vector system processes are running. The following processes (in the order they are started) are the minimum required for a complete installation:

iigcn

Name Server process

iigcc

Communications Server process (present only on sites with Vector Net)

iidbms (II_IUSV_nnn)

Recovery Server process

iigcd

Data Access Server process (present only if JDBC and/or .NET access is configured)

dmfacp

Archiver process

iidbms

DBMS Server process

Note: If the command ingprenv | grep II_CLIENT shows “II_CLIENT = true”, you need to run only the Name Server and Communications Server processes.

After a Vector database has been accessed, there will also be an iix100 process for that database. If no Vector databases have been accessed, it is normal to see no iix100 processes running. The system databases iidbdb and imadb are not considered Vector databases.

3. If ingstart does not complete successfully, try to identify the reason for startup failure. For example:

The problem is with ingstart. The ingstart script fails due to results of the checks it makes for sufficient resources and installation settings. If this is the reason for startup failure, correct the deficiency.

A process failed to start. If a process failed to start, continue on to the details sections on startup problems for that specific process.

Detect Vector Startup Problems on System Reboot

To detect Vector startup problems on system reboot on Linux, follow these steps:

1. The most common cause of startup failure following a reboot is failure to include the startup command ingstart in the boot script for your machine. (The boot file is vendor-specific but can be named “/etc/rc” or “etc/rc.local”.) This file contains the commands that are to be executed immediately after a reboot.

Make sure that the following line appears in the boot script:

su userid -c "ii_system/ingres/utility/ingstart ii_system" /dev/console

where:

userid refers to the user that owns the installation

ii_system refers to the value of II_SYSTEM for your installation.

2. Make sure that /dev/kmem is readable to the user that owns the installation. If this is a security problem for your machine, you can add this user as a member of /dev/kmem’s group and make the /dev/kmem group readable.

Issue the following command at the operating system prompt:

chmod g+w /dev/kmem

The user that owns the installation must be able to read /dev/kmem or the kernel resource checks in ingstart fails.

3. Run ingstart. If the installation still does not start, contact technical support, as described in What You Need Before Contacting Actian Support.

Check Shutdown Problems

If Vector has problems during shutdown on Linux, follow these steps:

1. Check environment variables in the local user environment by entering the following commands at the operating system prompt.

a. Verify that II_SYSTEM is set correctly:

echo $II_SYSTEM

/usr/r6 (this varies system by system)

b. Check that you have the full search path to $II_SYSTEM/ingres/bin and $II_SYSTEM/ingres/utility

echo $PATH

2. If there is more than one Vector installation on this machine, identify your installation code by typing the following command at the operating system prompt:

ingprenv | grep II_INSTALLATION

The two-letter installation code is displayed (for example, the code here is JB):

II_INSTALLATION=JB

Take note of your installation code: ______.

3. Identify if you are shutting down a client installation or a full installation by issuing the following command at the operating system prompt:

ingprenv | grep II_CLIENT

If this displays “II_CLIENT=true”, this is a client installation, and only two processes (iigcn, iigcc) are running on this node.

4. If you are having trouble shutting down a client installation because Vector believes a Communications server is running locally when there is none, remove this file:

$II_SYSTEM/ingres/files/name/clientname/IICOMSVR_clientname

For troubleshooting details see the Connectivity Guide.

5. Identify whether Vector is recovering aborted transactions. Details on recovery delays are described in Recovery Process Monitoring. Issue the following command at the operating system prompt and examine the output for the word “RECOVER”:

logstat | grep RECOVER

If Vector is recovering aborted transactions, wait for this process to finish. Continue reissuing the logstat command and examining the STATUS field. When it says: “ONLINE, ECP DONE,” proceed with normal shutdown.

6. Now shut down the installation:

ingstop

Note: All users must be logged out of Vector (that is, no sessions running in the server) for the ingstop script to succeed.

7. Check that all processes are shut down:

c. Display running processes by issuing the following command at the operating system prompt:

BSD:

ps -aux | grep ingres

System V:

ps -ef | grep ingres

d. If shutdown succeeds, none of the following processes are running:

• iigcn

• iigcc

• II_IUSV_nnn

• dmfacp

• iidbms

• iijdbc

• iigcd

• iistar

• rmcmd

• iigcb

• iix100

8. If any of these installation processes are not shut down, note the process ID of the running processes and do the following:

a. Shut them down manually with the operating system command:

kill- QUIT process_id

where:

process_id refers to the process ID of the process to stop.

b. If processes are still not shut down, issue the operating system command:

kill -9 process_id

IMPORTANT! If your site has more than one Vector installation, examine the installation code associated with the iidbms process to make sure you are stopping only the processes associated with the installation you need to shut down.

9. Check that no shared memory segments remain allocated to this installation. Execute the operating system command:

csreport

The following message indicates that shared segments have been properly removed:

!Can’t map system segment

10. If shared memory segments remain for this installation, interactively remove them:

a. To deallocate shared memory resources, issue the operating system command:

ipcclean

b. Use the Linux command ipcs to verify that the actual segments have been removed:

ipcs

c. If they have not been properly removed, you must delete them manually.

IMPORTANT! If your site has more than one Vector installation, you must take care to only remove shared memory or semaphores for the installation to be shut down. If your machine contains more than one installation, enter the following command from the environment of the installation you need to target:

csreport

The csreport utility displays the shared memory and semaphore segment identifiers for this installation.

To remove the targeted segment(s), use the Linux command:

ipcrm -mmid

ipcrm -ssid

where mid is the shared memory segment identifier and sid is the semaphore identifier.

11. Verify that the following files are not present. If they are still present, you must remove them:

$II_SYSTEM/ingres/files/memory/lockseg.mem
$II_SYSTEM/ingres/files/memory/sysseg.mem

Your installation is now shut down. Instructions on how to check and restart Vector are described in Check Vector Installation.

Vector Processes

The major Vector processes are as follows:

• X100 process (iix100)

• Name Server process (iigcn)

• Communications Server process (iigcc)

• Recovery process (II_IUSV_nnn)

• Archiver process (dmfacp)

• DBMS Server process (iidbms)

• Data Access Server process (iigcd)

• Bridge Server process (iigcb)

• Remote Command process (rmcmd)

Name Server Errors

The Name Server process (iigcn) is not running if either of the following occurs:

• You receive a specific error indicating that the Name Server process (iigcn) failed to start.

• The command ps -aux (BSD) or ps -ef (System V) shows that the iigcn is not running.

You can verify this by attempting to start the Name Server manually.

Check for Name Server Errors

If the Name Server does not start, follow these steps:

1. Verify that TCP/IP is properly installed by typing the following command at the operating system prompt:

telnet localhost

A loopback login to your machine occurs.

2. Verify that the required TCP daemon process for your operating system is running.

The specific process name is system dependent, but on many Linux systems, the process is named “inetd” (use your process name in the command below if it is not inetd). Issue the following command at the operating system prompt, or see your operating system manual for your TCP/IP implementation:

BSD:

ps -aux | grep inetd

System V:

ps -ef | grep inetd

3. Check that the Vector environment variable II_GCNxx_PORT is not set (this environment variable contains the TCP port identifier of the Name Server process):

a. Use the ingprenv utility to verify that this environment variable is not set when the Name Server tries to start up.

b. If necessary, use the ingunset command to unset the II_GCNxx_PORT environment variable.

4. If you corrected a Name Server problem, verify that Vector starts normally:

a. Shut down the partially started installation with the ingstop command.

b. Restart the installation with the ingstart command.

5. If you are still having problems, set the following trace to capture additional diagnostic data before calling technical support:

Bourne Shell:

II_GC_TRACE=5

II_GC_LOG = stdio (stdio or filename); export II_GC_TRACE II_GC_LOG

C Shell:

setenv II_GC_TRACE 5

setenv II_GC_LOG stdio (stdio or filename)

iirun iigcn

Check for Communications Server Process Errors

If the Communications Server process (iigcc) did not start, follow this procedure:

1. Verify that no local environment variables are set in the local user environment that contain the string “GCC”. At the operating system prompt, type the command:

BSD:

printent | grep GCC

System V:

env | grep GCC

The Vector environment variable II_GCNxx_PORT is set in the Vector symbol table. It is not visible from the Linux environment using the printenv or env commands but is visible in the Vector environment. Verify this by typing ingprenv.

2. Check the value of the Vector environment variable II_RUN with the following command entered at the operating system prompt:

ingprenv

a. If this machine is running Ingres Net, the value of II_RUN is either “,NET” or “,DBMS, NET”. If the value of the Vector environment variable II_CLIENT is TRUE, II_RUN is “,NET”. If II_RUN is not set correctly, reset it using the command:

ingsetenv II_RUN ', DBMS, NET'

b. If this machine is an NFS client (that is, does not run a DBMS Server locally) the value of II_RUN is “,NET”. If II_RUN is not set correctly, reset it using the following command entered at the operating system prompt:

ingsetenv II_RUN', NET'

For details on setting environment variables, see the chapter "Setting Environment Variables."

3. If you are still having problems, set the following trace to capture diagnostic data and attempt to restart the Communications Server:

Bourne Shell:

II_GCC_TRACE=4

II_GCA_LOG=stdio

export II_GCC_TRACE II_GCA_LOG

iirun iigcc

C Shell:

setenv II_GCC_TRACE 4

setenv II_GCA_LOG stdio

iirun iigcc

4. If you corrected a Communications Server problem, verify that Vector starts normally:

a. Shut down the partially started installation with the ingstop command.

b. Restart the installation with the ingstart command.

Check for Bridge Server Process Errors

If the Bridge Server process (iigcb) did not start, verify that you have installed the Ingres Net component and that you have installed the correct protocol drivers.

If you are still having problems, set the following trace to capture additional diagnostic data before calling technical support:

ingsetenv II_GC_TRACE 4

ingsetenv II_GC_LOG filename

ingstart -iigcb

Recovery Process Errors

The recovery process (dmfrcp) must be running before a DBMS server can be started. Failure of the dmfrcp starting process indicates one of the following:

• An improper installation configuration exists, as described in Troubleshoot Startup, Shutdown, or Configuration Problems.

• Problems with the log file

• Insufficient or previously allocated shared memory

Check for Recovery Process Errors

If the recovery process does not start, perform the following procedure:

1. Check that the shared memory resources are properly installed. Use the csreport utility and check that the size, ownership, and permissions of the semaphore and shared memory segments meet the minimum requirements for your port.

2. Check for the existence of the transaction log by opening the Primary Transaction Log window in Configuration Manager (or the Transaction Log screen in the Configuration-By-Forms (cbf) utility), and noting the directories listed in the Log File Root Locations table in Configuration Manager (or the Primary Transaction Log Locations table in cbf).

3. Look for the file ingres_log.lnn (where nn is an integer between 1 and 16) in the ingres/log directory (located below all other listed directories).

4. Make the following checks on the transaction log file ingres_log.lnn:

a. Verify that ingres_log.lnn exists at that location by entering the following command at the operating system prompt:

ls -l

If it does not exist, you must recreate it using the Configuration-By-Forms (cbf) or Configuration Manager (vcbf) utility.

b. Verify that ingres_log.lnn is owned by installation owner.

If it is not, issue the following command at the operating system prompt, where userid is the user ID of the installation owner:

chown userid ingres_log

c. If the transaction log file was created as:

An ordinary Linux file, make sure it has permissions 660 (that is,
“-rw-rw----”). If not, issue the following command at the operating system prompt:

chmod 660 ingres_log

A raw log, permissions is “crw------”.

5. Verify that Vector starts normally.

a. Shut down the partially started installation with the ingstop command.

b. Restart the installation with the ingstart command.

6. If you still cannot start the II_IUSV_nnn process, you need to completely shut down the installation, re-run ingbuild, and reconfigure the log file.

IMPORTANT! This step must only be done as a last resort. Keep in mind that this reinitializes the log file, and any outstanding transactions are lost.

To shut down the system, complete the following steps:

a. Issue the command ingstop.

b. Work through Check Shutdown Problems until your Vector installation has been cleanly shut down.

Check for Remote Command Process

You can check for the presence of the optional remote command process by entering the following command at the operating system prompt:

ps -ef | grep rmcmd

Archiver Process

The archiver process (dmfacp) does not start unless the recovery (dmfrcp) process is running. However, an installation runs without an archiver process until the log file fills up. User programs are suspended as outstanding transactions in the log file are backed out. For information about the Vector recovery state, see Recovery Process Monitoring.

• Archiver process (dmfacp) startup errors are likely to result from:

• Improper shared memory resources

• Inability to read the transaction log file

• Inability to write journal files

Check for Archiver Process Errors

Use the following procedure to check archiver process startup problems. Some of these checks are the same as for the recovery process:

1. Check that the shared memory resources are properly installed: Use the csreport utility and check that the size, ownership and permissions of the semaphore and shared memory segments meet the minimum requirements for your port. For requirements, see the Readme file.

The command to allocate Vector shared memory and semaphores (when logged is as the installation owner) is csinstall. You can display them from Linux with the command ipcs.

2. Make the following checks on the transaction log file “ingres_log”:

a. Verify that “ingres_log” exists at that location by entering the following command at the operating system prompt:

ls -l

If it does not exist, you must create it by running the ingbuild program.

b. Verify that “ingres_log” is owned by the user that owns the installation.

If not, issue the following command at the operating system prompt, where userid is the user who owns the installation:

chown userid ingres_log

c. If the transaction log file was created as:

An ordinary Linux file, make sure it has permissions 660 (that is,
“-rw-rw----”). If not, issue the following command at the operating system prompt:

chmod 660 ingres_log

A raw log, permissions is “crw------”.

Check that the II_JOURNAL location is a valid location. Issue the following command at the operating system prompt:

infodb | grep ii_journal

3. Check journal locations by verifying that:

a. The journal location name points to a valid directory containing subdirectories “ingres/jnl/default/dbname”

b. The permissions on these directories are:

- 755 for the ingres directory
- 777 for the default directory

4. Check that the disk partition containing the journal files is not 100% full. Issue the following command at the operating system prompt:

If a journal partition is 100% full, it is impossible to write journals and the Archiver stalls. If this is the reason preventing Archiver startup, you must either free space on the journal partition or temporarily disable journaling with the alterdb command.

5. If you corrected an Archiver process problem, verify that Vector starts normally:

a. Shut down the partially started installation with the ingstop command.

b. Restart the installation with the ingstart command.

Check for DBMS Server Process Errors

The command to start the DBMS Server process (iidbms) from the installation owner login is ingstart -iidbms. If the DBMS Server did not start:

1. Verify that the recovery process (dmfrcp) is running. Details are described in Recovery Process Errors.

2. Verify that the recovery process is not in a recovery state. (See Recovery Process Monitoring.) This is likely if there was a sudden shutdown because of a power failure or other system failure.

3. Try to start up a DBMS server. Issue the following command at the operating system prompt:

ingstart -iidbms

4. If you corrected a DBMS server problem, verify that Vector starts normally. Shut down the partially started installation with the ingstop command.

5. Restart the installation with the ingstart command.

Check for Data Access Server Process

The command to start the Data Access Server process (iigcd) from the installation owner login is ingstart -iigcd. If the Data Access Server did not start:

1. Verify that the recovery process (dmfrcp) is running. For more information, see Check for Recovery Process Errors.

2. Verify that the recovery process not in a recovery state. (See Recovery Process Monitoring.) This is likely if there was a sudden shutdown because of a power failure or other systems failure.

3. Try to start up a Data Access server. Issue the following command at the operating system prompt:

ingstart -iigcd

4. If you corrected a Data Access server problem, verify that Vector starts normally:

a. Shut down the partially started installation with the ingstop command.

b. Restart the installation with the ingstart command.

Problems with Tools Startup

The following flow chart helps you isolate a problem when starting a Vector tool:

DBMS Server Stopped

Once started with ingstart, the DBMS Server process must continue running until the ingstop or iimonitor command is issued to stop it. If the DBMS Server stops running (“dies”) for any other reason, report it to technical support along with the associated error log messages and, if possible, the cause of the DBMS Server stopping.

1. Document error log entries associated with the process death. Details on reading the log files are described in Check the Error Log Files. Save all errors for technical support.

2. Isolate the reason your DBMS Server process died.

a. Isolate which operations, application, query, and tables are needed to duplicate the problems. See the copyapp and unloaddb command descriptions in the Command Reference.

b. Save this to make a test case for technical support.

3. If the immediate cause cannot be isolated, perform long-term diagnostics with II_DBMS_LOG. This diagnostic tool is especially valuable for fatal DMF errors and Vector server startup or shutdown problems.

a. Set II_DBMS_LOG to capture a “snapshot” of the DBMS Server when it stops by setting it to the full path name of a file before starting the DBMS Server. For example:

C Shell:

setenv II_DBMS_LOG $II_SYSTEM/ingres/files/dbms_%p.trace

Bourne Shell:

II_DBMS_LOG = $II_SYSTEM/ingres/files/dbms_%p.trace

EXPORT II_DBMS_LOG

At startup, the %p in the II_DBMS_LOG specification is replaced by the Process Identifier (PID) of the server process. This prevents DBMS servers from clobbering each other's logs (or the recovery process log)

b. When the DBMS Server shuts down, information is dumped to the DBMS log file. You must rename the file before restarting Vector or the new server overwrites the file. Prepare to send this file along with associated errors from the error log files to technical support for analysis.

Database Connection Problems

Database connection problems occur in the following scenarios:

• If you cannot connect to any database, including iidbdb

• If you encounter errors while connecting to individual databases

No Database Connections

If you cannot connect to any database, follow these steps:

1. Check the errlog.log to see if there are any associated messages. These messages are often more informative than the message displayed on your screen and can quickly identify the source of failure. Check for associate messages in the errlog.log by using their timestamps.

2. Ensure that all your installation’s processes are running, as described in Check Vector Installation.

3. Run logstat to ensure that the logging system status is ONLINE and not LOGFULL. If the status is not ONLINE, see Diagnose Logging System Problems.

4. Use iinamu to interrogate the Name Server. (For additional information, see iinamu in the Command Reference). Type show ingres to verify that the DBMS Server has registered with the Name Server.

5. Use the iimonitor utility to see if you can connect to the DBMS Server. See iimonitor in the Command Reference.

If you can connect, at the “IIMONITOR>” prompt type show sessions to examine DBMS server activity.

If you cannot connect to the DBMS Server, see Diagnose Logging System Problems.

Check the Vector environment variable/logical II_DBMS_SERVER:

Use ingprenv and your system’s env or printenv command to verify that II_DBMS_SERVER is not set, either in the Vector symbol table or your local environment.

6. Restart the Name Server if II_DBMS_SERVER works.

7. If the error condition persists, contact technical support, as described in What You Need Before Contacting Actian Support.

Individual Database Connection Failure

If you can connect to some databases but not others:

1. Check the errlog.log for database connection error messages. These messages can quickly identify the source of the failure.

2. Check vectorwise.log for iix100 server startup error messages.

3. Check that the database is not exclusively locked by another user.

4. Check database permissions and ownership, as described in Check Vector Installation.

Verify that your database still exists and is not being recovered by the recovery system.

5. Check if the database is listed:

Type catalogdb and choose Databases.

Using Actian Director: Select the database from the Instance Explorer, and then select Properties, Information. Make sure the database is not listed as inconsistent.

Using VDBA: Select a server and click Connect DOM. Select Database from the drop down menu and choose infodb. Make sure the database is not listed as inconsistent and that the status indicated is “VALID.”

Inconsistent Databases and Recovery

An inconsistent database occurs when administrative changes to a database in the transaction log do not agree with information maintained in the database’s configuration file.

The main causes of inconsistent database errors are improper system administration procedures. These include:

• Initializing the transaction log file with the -force_init_log flag of rcpconfig while the log file still contains open transactions

• Moving or altering files or installation variables without using the appropriate utility such as ingbuild or unloaddb

• Improper procedures when recovering Vector data from operating system backups

Inconsistent database errors can also be caused by hardware problems or software problems. For example, inconsistent database errors can be caused when the transaction log file or the configuration file has been corrupted by a hardware failure. Inconsistent database errors can also be caused by software bugs. In either case, contact technical support.

Automatic Recovery

Vector automatically handles the transaction failures that cause most database inconsistencies.

Recovery During Normal Operation

If a user program exits or a transaction is aborted for some other reason, the DBMS Server automatically handles transaction rollback. This does not cause an inconsistent database.

Recovery at Shutdown

At shutdown, all users must have exited their sessions; therefore, all transactions are committed. If users exited their sessions abnormally, the DBMS Server aborts any open transactions associated with the aborted sessions. Very long transactions take time to roll back and cause ingstop to seem to hang. The DBMS Server process cannot exit normally until it finishes recovering the aborted transactions.

If transactions are being rolled back on shutdown, allow the DBMS Server to finish this task before shutting down. If you do not, longer delays occur at startup time while the recovery process is performing rollback.

Recovery at Startup

If transactions have been aborted and were not recovered by a normal shutdown, upon restart the recovery process performs recovery. This occurs, for example, if:

• Processes are forcibly killed from the operating system

• The machine is rebooted

• Power to the system is interrupted

The recovery process performs the following steps upon startup:

1. Reads the transaction log file. If there has not been a normal shutdown, the recovery process detects that databases are inconsistent—that is, that Vector previously exited without completing all the transactions required for system and database consistency.

2. Proceeds through the transaction log file to back out uncommitted transactions and complete committed fast-commit transactions until the databases are again in a consistent state. While recovery is proceeding, no user interfaces can connect to a database.

Recovery actions are logged in the file $II_SYSTEM/ingres/files/iircp.log.

Recovery Process Monitoring

If you are monitoring Vector startup after a machine reboot, the following messages are displayed:

Starting Vector Name Server...
Starting Vector Communications Server...
Starting Vector Recovery Process...

If the transaction log contained uncommitted transactions when the machine failure occurred, the startup script pauses while the recovery process recovers transactions from the transaction log file. No messages are printed to the screen.

If you are in doubt as to whether recovery is taking place during startup, or to monitor the recovery process, use the following procedure.

Display the recovery process log file by typing the following command at the operating system prompt:

tail -f iircp.log

If the system is recovering, the recovery actions are logged to the IIRCP.LOG file. This indicates that Vector is automatically recovering from possible inconsistencies.

Messages are printed to the log file during recovery:

• The message at the beginning of recovering transactions indicates that transaction recovery has begun.

• Intermediate messages track recovery progress. As recovery proceeds, progress messages (for example, “Recovered 31 of 130 transactions”) are displayed.

• When done, the following message is printed:

Recovery complete.

You can also use an operating system command to determine whether the recovery process is recovering transactions by checking to see if it is accumulating CPU time.

On Linux, you can also monitor the files in the database directory of the database you suspect of being the target of the updates that are being backed out. The following command entered at the operating system prompt shows whether data files are being updated:

ls -lt

The file most recently updated is listed first along with the time of last update.

If any of the monitoring techniques above indicate that transaction recovery is taking place, continue to monitor the recovery process until recovery has completed. When the recovery process is complete, CPU time is not accumulated.

After the recovery process has finished, restart the installation with the ingstart command. The ingstart utility first shuts down and brings up all required installation processes. Programs can connect to the databases.

Inconsistent Database

If you receive an “inconsistent database” error after recovery is complete, it means updates and modifications were not properly completed or rolled back, and the database is therefore in an inconsistent state.

Following are examples of “inconsistent database” errors that indicate your database has become inconsistent:

E_DM0100 DB_INCONSISTENT Database is inconsistent
E_US0026 Database is inconsistent. Please contact the ingres system manager
E_DM9327 BAD_OPEN_COUNT

Diagnose an Inconsistent Database

Diagnose the cause and extent of an inconsistent database problem before you attempt to recover your database. Knowing the cause of the problem is essential to choosing the proper recovery procedures. Once a database has been rolled forward from a checkpoint, recovered from an operating system backup, or forced consistent, you cannot determine the cause of inconsistency.

To diagnose the cause and extent of an inconsistent database problem:

1. Read and save the full text of the error messages in errlog.log and iircp.log.

2. Run the infodb command (from the operating system prompt or from the Database menu in Actian Director or VDBA) to read the database's configuration file and identify the cause of inconsistency.

If the configuration file can be opened and read, the cause of the inconsistency is displayed. Save the output of infodb for technical support.

If the database's configuration file, "aaaaaaaa.cnf", cannot be read, it is corrupted. You need to recover from a backup, as described in Recover an Inconsistent Database.

3. Review the history of your Vector installation. Look for improper system administration procedures that have caused the database to become inconsistent. See the table in Common Causes of Inconsistent Databases.

4. Report your problem to technical support. If inconsistent database was not caused by incorrect system administration procedures, hardware failure, or known operating system software bugs, record the information, as discussed in What You Need Before Contacting Actian Support.

Common Causes of Inconsistent Databases

Common causes of inconsistent databases are:

• Operating system backups

• Incorrect installation paths

• Disabling of logging/recovery system

• Use of unsupported hardware configuration

Inconsistencies Due to Operating System Backups

To recover a database from an operating system backup that was made while the installation was running, see Recover an Inconsistent Database.

Inconsistencies Due to Incorrect Installation Paths

Changing Vector installation variables (such as II_SYSTEM, II_DATABASE, II_CHECKPOINT, II_JOURNAL, II_DUMP, or II_WORK) without using proper procedures, causes inconsistency between the information stored in the installation variables and those stored in the database configuration file “aaaaaaaa.cnf”.

Database inconsistency can occur if you move a database, table, application or some other object by using operating system commands rather than the supported Vector utilities. If the inconsistency is the result of moving a database from another location or installation without using unloaddb, you must remove the database using destroydb, recreate the database using createdb, and repopulate the database using the unloaddb utility.

A database file can become corrupted from hardware or software failures of various kinds. A data file can be inadvertently deleted by hand, but this is rare because only the user who owns the installation can write to the database directories.

If you are in doubt about whether transactions are being recovered, run the logstat utility and examine the “Status” field. It is marked RECOVER if in the recovery state. While a recovery is taking place, for example when restarting after a system failure, the recovery process requires time to read through the transaction log file to back out uncommitted transactions and complete fast-commit transactions. To users, the system appears to hang.

Examine Configuration File of a Database

To examine the configuration file of your database, enter the infodb command at the command prompt. You can also use the Database menu in Actian Director or VDBA.

1. Compare the path information for the checkpoint, journal, data and dump locations with that defined for these environment variables as displayed by the following command:

ingprenv

2. Return the installation variables to the values displayed to by infodb, if the values have changed. If these values are not the same, the installation variables have been changed, or the database has been imported from some other Vector installation.

3. If you need to change the existing values of Vector installation variables or import a database from another site, you must use the unloaddb utility, as this creates a new, up-to-date configuration file for the database. For a discussion of Vector environment variable that cause an inconsistent database if changed after installation is completed, see the chapter “Environment Variables.”

Recovery Rules

The following are rules that you should keep in mind about the recovery of transactions:

• It takes at least as long to recover aborted transactions as it took to execute them originally.

• The amount of time required for recovery depends on the number of users and transactions, transaction semantics (whether autocommit is set), and the consistency point interval.

• While recovery is proceeding, all users are denied access to databases. Any attempt to connect to a database at this point returns an error such as the following:

E_LQ0001_STARTUP gca protocol service request failure.

• Database inconsistency can occur if a user or the system administrator attempts to “force” entry into the installation by running rcpconfig with the -force_init_log flag (thus erasing the transaction log file) before the recovery system has finished rolling back the uncommitted transactions during recovery.

After a system failure, monitor recovery and always allow it to proceed until the “Recovery Complete” message appears in iircp.log.

Inconsistencies Due to Disabling of Logging or Recovery System

By disabling the logging or recovery system, the DBA can temporarily turn off logging for the database to speed bulk loading of data. If logging has been turned off for this database, a NOLOGGING error message appears in the error log file. Typically this message is:

E_DM9050_TRANSACTION_NOLOGGING Database dbname has been updated by a session running with SET NOLOGGING defined.

If the database has become inconsistent, you can check for this error message by typing the following command:

grep DM9050 \$II_SYSTEM/ingres/files/errlog.log

If the NOLOGGING error message appears, logging was disabled on this database. If the NOLOGGING message in the error log was written later than the most recent checkpoint of this database, the database must be restored from the checkpoint. To determine if this is the case, compare the timestamp on the error message in errlog.log with the timestamp in the “checkpoint history” field of the output from the command infodb dbname.

For details on set nologging, see the User Guide.

Database Inconsistencies Due to Use of Unsupported Hardware Configurations

Database inconsistencies can be caused by using unsupported hardware configurations on NFS. In systems that include Network File System (NFS) mounts, be aware that Vector:

• Supports NFS client installation configurations in which the DBMS Server process and data directories on one node are accessed by application programs executing on another.

• Does not support running DBMS servers on one node and accessing data directories on another network node using NFS. The configuration can cause undetected write errors that lead to database inconsistency.

To check your configuration, type mount at the operating system prompt. Make sure that the data directories (II_DATABASE, II_CHECKPOINT, II_JOURNAL and II_LOG_FILE) are not NFS-mounted from a remote node.

Recover an Inconsistent Database

The recommended method of recovering an inconsistent database is to use the Restore operation in Actian Director or rollforwarddb operation in VDBA. You can also enter the rollforwarddb command from the command line.

If no Vector checkpoint exists, you can recover from an operating system backup.

Make Inconsistent Database Consistent

The recommended way to make your inconsistent database consistent is to use rollforwarddb. It recovers the database from a previous checkpoint and, if journaling was enabled, applies the associated journals. For the full procedure, see the User Guide.

Use of Operating System Backup

We do not support nor recommend the use of operating system backups as your primary means of ensuring database recoverability. The Vector checkpoint and journaling programs provide the secure way to ensure that your data is recoverable.

IMPORTANT! Operating system backups must be used only as a last resort, when Vector checkpoints have been lost or destroyed, and only under the direction of technical support.

When No Backup Exists

If you have an inconsistent database for which no checkpoints or operating system backups exist, you can still gain access to that database and attempt to salvage the data using the verifydb utility.

The verifydb utility can be used to unset the “inconsistent database” flag in the configuration file “aaaaaaaa.cnf”. This permits access to the database; it does not, however, make the data consistent. If verifydb is used to force access to a database that is inconsistent, the state of the database remains unknown. Such a database becomes unsupportable by technical support. Data can be lost, and problems occur weeks or months later. Technical support cannot diagnose the state of such a database because the built-in consistency checks have been overridden.

The format “verifydb -oforce_consistent” does not recover a database. It merely allows access and continued operation to a database that is in an inconsistent state.

Gain Emergency Access to an Inconsistent Database Using verifydb

If you must use verifydb to gain emergency access to data in an inconsistent database, do so as follows:

1. Save all information, as outlined in What You Need Before Contacting Actian Support.

2. Back up the database directory at the operating system level.

3. Run verifydb in report mode by typing the following command at the operating system prompt:

verifydb -mreport -sdbname "dbname" -odbms_catalogs -u$ingres

Verifydb output is logged in $II_SYSTEM/ingres/files/iivdb.log:

4. To repair inconsistencies in the system catalogs interactively and force the database consistency flag, type the following commands at the operating system prompt:

verifydb -mruninteractive -sdbname "dbname" -oforce_consistent -u$ingres

verifydb -mruninteractive -sdbname "dbname" -odbms_catalogs -u$ingres

5. Call Technical Support if additional assistance is required to resolve the inconsistency.

Performance Problems

Most performance problems stem from multiple causes. A complete performance analysis must include each item in the Diagnostic Hierarchy section below. But if you need to resolve a specific performance problem quickly, focus your attention on that area. The information in this troubleshooting section is designed to assist you in resolving a performance issue after first isolating which factors are influencing performance.

Use the procedures in this section if a Vector tool “hangs,” that is, your program seems to start but nothing happens. Other programs such as SQL display header information, but when you issue a command, nothing happens.

Flow Diagram for Troubleshooting Performance Problems

Use the following flow chart to identify and isolate a system performance problem:

Diagnose Logging System Problems

Use the following procedure to diagnose a DBMS server that is not responding due to logging system problems.

1. Check the logging system by issuing the following command at the operating system prompt to invoke the logstat utility:

logstat | more

2. If you are unable to start up logstat, the recovery process has probably taken an exclusive lock and is in recovery. (See Recovery Process Monitoring.)

3. If logstat starts up, check the status field, as described in Logstat Status Fields.

Logstat Status Fields

The problem states and resolutions of logstat fields are described in the following table:

Status Field	Description	Action
ONLINE, ECPDONE	A consistency point is completed. The system is fully functional and online.	The logging system has finished a consistency point. This status flag is present most of the time while the logging system is functioning normally. If this is your status, stop this procedure and review Identifying Operating System Resource Problems.
LOGFULL	The log file is full.	A status of LOGFULL means that the system is suspended from processing new requests because there is no more room in the transaction log file. It remains so until the problem is corrected. Examine the other status field entries. If ARCHIVE is also indicated, the Archiver is actively processing the log file to free up space. The LOGFULL condition is removed when the Archiver is finished.
CPNEEDED	A consistency point is needed.	CPNEEDED means the logging system is about to take a consistency point.
ARCHIVE	The Archiver is processing.	The status ARCHIVE means the Archiver process is archiving.
START_ARCHIVER	The Archiver has stopped.	A status of START_ARCHIVER means that the archiver process stopped. Restart the Archiver.
FORCE_ABORT	The force-abort- limit has been reached.	If the status is FORCE_ABORT, a transaction came too close to the end of the transaction log file before the oldest was committed to disk and reached the force-abort-limit. The oldest open transaction is aborted to free space for the new one. You can use logstat (or ipm) to monitor the force abort. The system frees the FORCE_ABORT status when the abort operation is completed.
RECOVER	The recovery process is performing recovery.	If the status is RECOVER, the logging system is in recovery, a normal mode. Recovery requires some time because Vector is optimized to commit rather than back out of transactions.

IMPORTANT! If the server is recovering a lengthy transaction, be aware that shutting down the server with iimonitor's "stop server" (or by killing processes at the operating system level) results in slower recovery. Allow the recovery to proceed normally. You can monitor the progress of the recovery with logstat and iimonitor or the ipm utility.

Many other normal states appear on the status line. For additional information on reading logstat output, see the Logstat section in the Command Reference.

How to Avoid Logfull Abort

To avoid logfull aborts, examine the transaction processing strategy errors that caused the log file to fill. Consider the following points:

• If transactions do many updates or are left open for a long time, you must either commit them more frequently or increase the size of the log file. If the log file size is increased, check the “percentage of logfile written” before a consistency point is taken. For details, see the ipm, logstat, or VDBA output.

• Avoid continuous transaction errors by:

– Setting autocommit on where applicable

– Avoiding application errors that are caused by beginning transactions before they get all the user input required or by failing to use Vector application timeout features. (Timeout features are discussed in the SQL Language Guide.)

• If you have taken the above precautions but still get log full/abort, the transaction log is sized too small for the types of transactions you need to process. Read the tips in the next section on choosing the correct transaction log file size.

• Use set log_trace to examine the size and type of log records being written to the log file.

Process of Resizing Transaction Logs

Resizing transaction logs involves these steps:

1. Determine whether the log requires resizing

2. Resize the transaction log file

3. Reestablish Dual Logging

Determine Whether a Log Requires Resizing

To determine whether a log requires resizing, use one of the following procedures.

Using VDBA’s Performance Monitor, start a monitoring session on the desired node:

1. Select the node in the left pane and click the Monitor toolbar button.

2. Select Log Information in the branch.

3. Click the Header tab in the right pane to display log usage.

Start the Interactive Performance Monitor utility (ipm).

1. Highlight the Log_info menu item and select the Select option.

2. Highlight the Header menu item and select the Select option.

3. The Log file Diagram shows the percentage use of the Transaction Log.

4. Exit the ipm program.

Find the current peak transaction load by using the logstat utility to monitor the amount of log file that is in use during your peak hours. This allows you to choose the correct size for your log file. Because requirements change, you must monitor the log file regularly.

1. Enter the logstat command.

2. Find the value under % of log file in use.

3. If the log file needs to be resized, select the new size.

Use mkrawlog to reconfigure a raw log file.

4. Shut down the installation by entering the ingstop command.

Resize the Transaction Log File

To resize the transaction file, follow these steps:

1. Shut down the Vector installation by entering the ingstop command.

2. Start the Configuration-By-Forms (cbf) or Configuration Manager (vcbf) utility.

a. Choose Transaction Log from the menu.

The file name is displayed along with the current size of the Transaction Log.

b. Choose Destroy.

When destroyed, the Transaction Log information table is emptied.

c. Select the Create option.

d. Enter the required Transaction Log file size in megabytes.

The transaction log is created.

3. Exit cbf or vcbf.

4. Restart the installation by entering the command ingstart.

IMPORTANT! Reconfiguring your log file destroys the current contents of the file and leaves an empty, reinitialized log file after the reconfiguration is complete. When Vector is shut down, all transactions are written to disk; therefore, to prevent inconsistent database problems, reconfigure the log file only after a successful installation shutdown procedure.

Reestablish Dual Logging

A failure in one of the two dual log files is most often symptomatic of hardware problems. In the event of such a failure, you must reestablish dual logging at the first opportunity.

To re-establish dual logging after a failure of one of the log files, follow these steps:

1. Shut down the installation.

2. Create a new log file in the location of the failed log.

3. Start the Configuration-By-Forms (cbf) or Configuration Manager (vcbf) utility.

4. Select dual transaction log.

5. Select Reformat.

This automatically determines which of the log files is valid and copies the contents of the valid log to the newly created log file. After the copy is complete, both log files are marked valid. Dual logging is reestablished.

6. Restart the installation with the ingstart command.

7. Check that both log files are enabled by using Configuration-By-Forms (cbf) or Configuration Manager (vcbf).

Resource and Maintenance Problems

Good performance requires planning and regular maintenance. Make sure your operating system is configured with sufficient resources for Vector. Insufficient system resources cause deficient performance or prevent Vector from starting.

Identifying Operating System Resource Problems

The following tools can help you identify operating system resource problems:

• Review the minimum requirements for a basic Vector installation given in the Readme file. If your environment requires more resources, use the Vector utilities to verify that there are enough resources.

• System resources can be monitored by some operating system utilities. Syntax details are described in Linux Operating System Utilities.

BSD:

pstat utility—to display the status of Linux system tables and system swap space

vmstat utility—to display virtual memory status

System V:

• sar utility—to display activity of various system resources such as CPU utilization, swapping activity, and disk activity.

• show memory—displays the system memory resources and the amount of non-paged dynamic memory (total, free and in use).

• show process/id=pid/continuous—displays the amount of page faulting, working set, buffered I/O, and direct I/O the server is doing.

• show device—indicates if a disk drive is out of disk space.

• show device /files—if there is a problem starting an installation, this command can be used to make sure that a Vector process is not holding on to a mailbox.

The installation utility allows the examination of all Vector installed images, showing the number of global pages and sections available and used.

Check System Resources

If Vector seems slow or unresponsive for no apparent reason, system resources may be insufficient. Follow these steps to diagnose the problem. Write down any error messages you receive when performing these steps:

1. Connect to your DBMS Server through Vector monitors:

a. First display the server_number of your DBMS Server using the iinamu utility:

iinamu

IINAMU> show ingres

b. Connect to the DBMS Server monitor by typing the command:

iimonitor server_number

c. To see the DBMS Server sessions, at the iimonitor prompt type:

IIMONITOR> show sessions

d. Check the status of the sessions to determine which one is making excessive use of the server. (You can use VDBA to check session status.)

For syntax details, see the sections iimonitor and iinamu in the Command Reference.

2. If repeated “show sessions” commands in iimonitor show that the query session is continually in a CS_EVENT_WAIT (LOCK) state, the problem involves concurrency and locking.

Alternatively, you can use the VDBA Performance Monitor to check for this problem.

a. Select Servers in the left pane of the Performance Monitor.

b. Select INGRES.

c. Select Sessions in the Servers.

3. If the session alternates between CS_EVENT_WAIT and CS_COMPUTABLE, this indicates that the query is processing. However, if the query is taking an excessive amount of time, set up a trace on it, as described in Trace Utilities.

a. Interrupt the query that is running:

• Interactively, use Ctrl+C and wait.

• In batch or background mode, use the following command to terminate:

kill pid

where pid is the process ID of the query.

The command format “stop proc” must be used only as a last resort. Use of this option can cause more problems than it solves.

b. Issue the command set gep.

• Rerun the query. This outputs a query execution plan.

• Alternatively, start an SQL window on the database in VDBA and click the Display Query Execution Plan button to graphically display the query plan.

• Interrupting a query requires some time because Vector is optimized to commit rather than back out of transactions. It takes at least as long to back out of a transaction as to process the transaction normally. The transaction must be fully backed out before sessions can resume and locks are freed.

4. It is useful to note whether the query runs differently when called from other Vector tools. For example, try issuing the same query from Interactive SQL, and Embedded SQL

5. Determine if you can access all data in the tables in all components of the query.

a. From the Terminal Monitor type:

select count(*) from tablename

This verifies that Vector can sequentially access every row in the table and indicates that other access paths (secondary indexes, hash pointers, B-Tree page pointers, and so on) can cause the problem. Queries using restrictive where clauses probably are using these secondary access methods.

b. Check for permits that apply to this data by typing the following command from the Terminal Monitor:

help permit tablename

What You Need Before Contacting Actian Support

Before contacting Actian Support, gather as much information as possible and have the following information available.

Save any relevant errors in each of the following logs:

• errlog.log

• iiacp.log

• iircp.log

Have the following information available:

• Your Contact ID.

• Your exact Vector version. Obtain this with the following command at the operating system prompt:

cat $II_SYSTEM/ingres/version.rel

• Your exact operating system version. Obtain this with the following command at the operating system prompt:

uname -a

Or:

cat /etc/motd

• The current Vector installation environment. Use the following command at the operating system prompt:

ingprenv > filename

• The current user environment. Use the command:

BSD:

printenv > filename

System V:

env > filename

• A clear description of what you are trying to do

• An indication of whether the failure occurs reproducibly