Designing .NET Applications for Performance Optimization
Developing performance-oriented .NET applications is not easy. The ADO.NET data providers do not throw exceptions to say that your code is running too slowly.
Retrieving Data
To retrieve data efficiently, return only the data that you need, and choose the most efficient method of doing so. The guidelines in this section will help you to optimize system performance when retrieving data with .NET applications.
Understanding the Architecture
ADO.NET uses ADO.NET data providers to access data. All .NET runtime data access is done through ADO.NET data providers.
ADO.NET development wizards use OLE DB to populate the grids of the wizards. Although OLE DB is accessed for these wizards at design time, the runtime components generated use ADO.NET data providers for accessing data. Optimizing data access via OLE DB in a .NET application is not a useful exercise. Instead, optimize the ADO.NET data provider as the component that will do all of the runtime work.
Retrieving Long Data
Unless it is necessary, applications should not request long data because retrieving long data across a network is slow and resource-intensive.
Most users don't want to see long data. If the user wants to see these result items, then the application can query the database again, specifying only the long columns in the select list. This method allows the average user to retrieve the result set without having to pay a high performance penalty for network traffic.
Although the best method is to exclude long data from the select list, some applications do not formulate the select list before sending the query to the ADO.NET data providers (that is, some applications use syntax such as select * from <table name> ...). If the select list contains long data, then some data providers must retrieve that data at fetch time even if the application does not bind the long data in the result set. When possible, try to implement a method that does not retrieve all columns of the table.
Sometimes long data must be retrieved. When this is the case, remember that most users do not want to see 100 KB, or more, of text on the screen.
Reducing the Size of Data Retrieved
To reduce network traffic and improve performance, you can reduce the size of any data being retrieved to some manageable limit by calling set max rows or set max field size, or some other database-specific command to limit row size or field size. Another method of reducing the size of the data being retrieved is to decrease the column size. If the data provider allows you to define the packet size, use the smallest packet size that will meet your needs.
In addition, be careful to return only the rows you need. If you return five columns when you only need two columns, performance is decreased, especially if the unnecessary rows include long data.
Using CommandBuilder Objects
It is tempting to use CommandBuilder objects because they generate SQL statements. However, this shortcut can have a negative effect on performance. Because of concurrency restrictions, the Command Builder does not generate efficient SQL statements. For example, the following SQL statement was created by the Command Builder:
CommandText: UPDATE TEST01.EMP SET EMPNO = ?, ENAME = ?, JOB = ?, MGR = ?, HIREDATE = ?, SAL = ?, COMM = ?, DEPT = ?
WHERE
( (EMPNO = ?) AND ((ENAME IS NULL AND ? IS NULL)
OR (ENAME = ?)) AND ((JOB IS NULL AND ? IS NULL)
OR (JOB = ?)) AND ((MGR IS NULL AND ? IS NULL)
OR (MGR = ?)) AND ((HIREDATE IS NULL AND ? IS NULL)
OR (HIREDATE = ?)) AND ((SAL IS NULL AND ? IS NULL)
OR (SAL = ?)) AND ((COMM IS NULL AND ? IS NULL)
OR (COMM = ?)) AND ((DEPT IS NULL AND ? IS NULL)
OR (DEPT = ?)) )
The end user can often write more efficient update and delete statements than those that the CommandBuilder generates.
Another drawback is also implicit in the design of the CommandBuilder object. The CommandBuilder object is always associated with a DataAdapter object and registers itself as a listener for RowUpdating and RowUpdated events that the DataAdapter object generates. This means that two events must be processed for every row that is updated.
Choosing the Right Data Type
Retrieving and sending certain data types can be expensive. When you design a schema, select the data type that can be processed most efficiently. For example, integer data is processed faster than floating-point data. Floating-point data is defined according to internal database-specific formats, usually in a compressed format. The data must be decompressed and converted into a different format so that it can be processed by the wire protocol.
Processing time is shortest for character strings, followed by integers, which usually require some conversion or byte ordering. Processing floating-point data and timestamps is at least twice as slow as integers.
Selecting .NET Objects and Methods
The guidelines in this section will help you to optimize system performance when selecting and using .NET objects and methods.
Using Parameter Markers as Arguments to Stored Procedures
When calling stored procedures, always use parameter markers for the argument markers instead of using literal arguments.
When you set the CommandText property in the Command object to the stored procedure name, do not physically code the literal arguments into the CommandText. For example, do not use literal arguments such as:
{call expense (3567, 'John', 987.32)}
ADO.NET data providers can call stored procedures on the database server by executing the procedure as any other SQL query. Executing the stored procedure as a SQL query results in the database server parsing the statement, validating the argument types, and converting the arguments into the correct data types.
In the following example, the application programmer might assume that the only argument to getCustName is the integer 12345:
{call getCustName (12345)}
However, SQL is always sent to the database server as a character string. When the database server parses the SQL query and isolates the argument value, the result is still a string. The database server must then convert the string ’12345’ into the integer 12345. Using a parameter marker eliminates the need to convert the string and reduces the amount of processing by the server:
{call getCustName (?)}
Designing .NET Applications
The guidelines in this section will help you to optimize system performance when designing .NET applications.
Managing Connections
Connection management is important to application performance. Optimize your application by connecting once and using multiple statement objects, instead of performing multiple connections. Avoid connecting to a data source after establishing an initial connection.
You can improve performance significantly with connection pooling, especially for applications that connect over a network or through the World Wide Web. Connection pooling lets you reuse connections. Closing connections does not close the physical connection to the database. When an application requests a connection, an active connection is reused, thus avoiding the network I/O needed to create a new connection.
Pre-allocate connections. Decide what connection strings you will need to meet your needs. Remember that each unique connection string creates a new connection pool.
Once created, connection pools are not destroyed until the active process ends or the connection lifetime is exceeded. Maintenance of inactive or empty pools involves minimal system overhead.
Connection and statement handling should be addressed before implementation. Spending time and thoughtfully handling connection management improves application performance and maintainability.
Opening and Closing Connections
Open connections just before they are needed. Opening them earlier than necessary decreases the number of connections available to other users and can increase the demand for resources.
To keep resources available, explicitly Close the connection as soon as it is no longer needed. If you wait for the garbage collector to implicitly clean up connections that go out of scope, the connections will not be returned to the connection pool immediately, tieing up resources that are not actually being used.
Close connections inside a finally block. Code in the finally block always runs, even if an exception occurs. This guarantees explicit closing of connections. For example:
try
{
DBConn.Open();
… // Do some other interesting work
}
catch (Exception ex)
{
// Handle exceptions
}
finally
{
// Close the connection
if (DBConn != null)
DBConn.Close();
}
If you are using connection pooling, opening and closing connections is not an expensive operation. Using the Close() method of the data provider's Connection object adds or returns the connection to the connection pool. Remember, however, that closing a connection automatically closes all DataReader objects that are associated with the connection.
Using Statement Caching
A statement cache is a group of prepared statements or instances of Command objects that can be reused by an application. Using statement caching can improve application performance because the actions on the prepared statement are performed once even though the statement is reused multiple times over an application’s lifetime.
A statement cache is owned by a physical connection. After being executed, a prepared statement is placed in the statement cache and remains there until the connection is closed.
Caching all of the prepared statements that an application uses might appear to offer increased performance. However, this approach may come at a cost of database memory if you implement statement caching with connection pooling. In this case, each pooled connection has its own statement cache that may contain all of the prepared statements that are used by the application. All of these pooled prepared statements are also maintained in the database’s memory.
Using Commands Multiple Times
Choosing whether to use the Command.Prepare method can have a significant positive (or negative) effect on query execution performance. The Command.Prepare method tells the underlying data provider to optimize for multiple executions of statements that use parameter markers. Note that it is possible to Prepare any command regardless of which execution method is used (ExecuteReader, ExecuteNonQuery, or ExecuteScalar).
Consider the case where an ADO.NET data provider implements Command.Prepare by creating a stored procedure on the server that contains the prepared statement. Creating stored procedures involves substantial overhead, but the statement can be executed multiple times. Although creating stored procedures is performance-expensive, execution of that statement is minimized because the query is parsed and optimization paths are stored at create procedure time. Applications that execute the same statement multiples times can benefit greatly from calling Command.Prepare and then executing that command multiple times.
However, using Command.Prepare for a statement that is executed only once results in unnecessary overhead. Furthermore, applications that use Command.Prepare for large single execution query batches exhibit poor performance. Similarly, applications that either always use Command.Prepare or never use Command.Prepare do not perform as well as those that use a logical combination of prepared and unprepared statements.
Using Native Managed Providers
Bridges into unmanaged code, that is, code outside the .NET environment, adversely affect performance. Calling unmanaged code from managed code causes the data provider to be significantly slower than data providers that are completely managed code. Why take that kind of performance hit?
If you use a bridge, your code will be written for this bridge. Later, when a database-specific ADO.NET data provider becomes available, the code must be rewritten; you will have to rewrite object names, schema information, error handling, and parameters. You'll save valuable time and resources by coding to managed data providers instead of coding to the bridges.
Updating Data
This section provides general guidelines to help you to optimize system performance when updating data in databases.
Using the Disconnected DataSet
Keep result sets small. The full result set must be retrieved from the server before the DataSet is populated. The full result set is stored in memory on the client.
Synchronizing Changes Back to the Data Source
You must build the logic into the PsqlDataAdapter for synchronizing the changes back to the data source using the primary key, as shown in the following example:
string updateSQL As String = "UPDATE emp SET sal = ?, job = ?" +
" = WHERE empno = ?";