SQL BLOG WITH LOG: Thursday, March 25, 2004

Introduction
Good Application Architecture is essential for any application to perform better than best ,but without the proper Database Support it becomes a distanct dream.

Software Enggs particularly our DBA s spend hours discussing the performance pit falls of any database design,But once production starts or in testing,whole new world of issues arise with respect to Database. Most important of them is the performance,But how do we say that any issue is performance issue?or for that matter rate the performnce of SQL Server.

May be the Hard ware is faulty or middle tier code is faulty (If written by me) .This Paper provides us a judgement tool and insight complexities involved on SQL 2000 performance.

Before studying this paper I request the audience to acknowledge themselves with various SQL Administration fundamentals.

Principles for Performance Tuning SQL Server
Microsoft SQL Server 2000 introduces and enhances methods and tools to tune SQL Server for optimum performance. Keep these principles in mind when you are tuning SQL Server:
• Let SQL Server do most of the tuning.
Microsoft SQL Server has been dramatically enhanced to create an auto-configuring and self-tuning database server. Take advantage of the auto-tuning settings available with SQL Server. These settings help SQL Server run at peak performance even as user load and queries change over time.
• RAM is a limited resource.
An integral feature of the database server environment is the management of RAM buffer cache. Access to data in RAM cache is much faster than access to the same information from disk, but RAM is a limited resource. If database I/O can be reduced to the minimum required set of data and index pages, these pages will stay in RAM longer. Too much unneeded data and index information flowing into buffer cache quickly push out valuable pages. The focus of performance tuning is to reduce I/O so that buffer cache is best utilized.
• Pick good indexes.
A key factor in maintaining minimum I/O for all database queries is to ensure that good indexes are created and maintained.
• Evaluate disk I/O subsystem performance.
The physical disk subsystem must provide a database server with sufficient I/O processing power for the database server to run without disk queuing. Disk queuing results in poor performance. This document describes how to detect and resolve disk I/O problems.
• Tune applications and queries.
Tuning your applications and queries becomes especially important when a database server will service requests from hundreds or thousands of connections by way of a given application. Because applications typically determine the SQL queries that will be executed on a database server, application developers must understand SQL Server architectural basics and how to take full advantage of SQL Server indexes to minimize I/O.
• Take advantage of SQL Server Profiler and Index Tuning Wizard.
SQL Server Profiler can be used to monitor and log a SQL Server workload, which can then be submitted to the Index Tuning Wizard to tune indexes for better performance. Regular use of SQL Server Profiler and the Index Tuning Wizard helps you optimize the indexes, allowing SQL Server to perform well with changing query workloads.
• Take advantage of SQL Server Performance Monitor.
SQL Server provides a revised set of Performance Monitor objects and counters, which are designed to provide helpful information for monitoring and analyzing the operations of SQL Server. This document describes key Performance Monitor counters to watch.
• Take advantage of Graphical Showplan and SQL Server Query Analyzer.
SQL Server Query Analyzer introduces Graphical Showplan, an enhancement to help analyze problematic Transact-SQL queries. SQL Server Query Analyzer also includes STATISTICS IO, another important tool option for tuning queries.
Review the max async IO Option During Configuration
The max async IO option should be reviewed and adjusted if necessary during your initial configuration of Microsoft SQL Server .

The max async IO option default of 32 is sufficient for lower-end disk subsystems. With a higher-end RAID storage subsystem attached to a database server that is capable of high disk I/O transfer rates, the setting of 32 may be inadequate because the RAID subsystem is capable of completing many more simultaneous disk transfer requests than 32. If the SQL Server write activity also dictates the need for more disk transfer capability, the max async IO option should be set higher.

An appropriate value for the max async IO option is one that allows checkpoint to finish before another checkpoint is needed (based upon desired recovery characteristics), but not to finish so fast that the system is seriously stressed by the event (a symptom of stress is disk queuing, which is discussed in further detail later in this document).
A general rule to follow for setting the max async IO option for SQL Servers running on large disk subsystems is to multiply by two or three the number of physical drives available to do simultaneous I/O. Then watch Performance Monitor for signs of disk activity or queuing issues. The negative impact of setting this configuration option too high is that it may cause checkpoint to monopolize disk subsystem bandwidth that is required by other SQL Server I/O operations such as reads.

To set the max async IO option value, execute this command in SQL Server Query Analyzer: sp_configure ‘max async io’, value, where value is expressed as the number of simultaneous disk I/O requests that SQL Server system can submit to the Windows® operating system during a checkpoint operation, which in turn submit the requests to the physical disk subsystem. For more information, see “Disk I/O Tuning Performance” later in this document. This configuration option is dynamic and does not require a stop and restart of SQL Server to take effect.

For more information, see these topics in SQL Server Books Online: I/O Architecture and max async io Option and check SP_CONFIGURE options.

Components that Consume CPU and Disk I/O Resources
You can optimize SQL Server performance with some careful attention to Microsoft SQL Server components that can consume resources.
Worker Threads

Microsoft SQL Server maintains a pool of Windows operating system threads to service batches of SQL Server commands submitted to the database server. The total of these threads (called worker threads) available to service all incoming command batches is dictated by the setting for the sp_configure option max worker threads. If the number of connections actively submitting batches is greater than the number specified for max worker threads, the worker threads are shared among connections actively submitting batches. The default setting of 255 works well for many installations.
Worker threads write out most of the dirty 8-KB pages from the SQL Server buffer cache. I/O operations are scheduled by worker threads asynchronously for maximum performance.
For more information, see these topics and/or keywords in SQL Server Books Online: Optimizing Server Performance Using Memory Configuration Options, SQL Server Memory Pool, Transaction Recovery, Write-Ahead Transaction Log, Freeing and Writing Buffer Pages, and SQL Server threads.
Lazy Writer
SQL Server lazy writer helps produce free buffers, which are 8-KB data cache pages without any data contained in them. As lazy writer flushes each 8-KB cache buffer out to disk, it initializes the cache page identity so that other data can be written into the free buffer. Lazy writer produces free buffers during periods of low disk I/O, so disk I/O resources are readily available for use and there is minimal impact on other SQL Server operations.
SQL Server automatically configures and manages the level of free buffers. Monitor the SQL Server: Buffer Manager - Free Buffers object to ensure that the free buffer level remains steady. Lazy writer maintains that the level of free buffers keeps up with the user demand for free buffers. The SQL Server: Buffer Manager - Free Buffers object should not drop to zero because this indicates there were times the user load demanded a higher level of free buffers than the SQL Server lazy writer was able to provide.
If the lazy writer cannot keep the free buffer steady, or at least above zero, it could mean the disk subsystem cannot provide lazy writer with the disk I/O performance that it needs to maintain the free buffer level (compare drops in free buffer level to any disk queuing to confirm there is a disk subsystem problem). One solution to the disk queuing problem is to add more physical disk drives (also called spindles) to the database server disk subsystem to provide more disk I/O processing power. The SQL Server: Buffer Manager – Lazy Writes/sec object indicates the number of 8-KB pages written to disk by lazy writer.
Monitor the current level of disk queuing in Performance Monitor by looking at the counters for (logical or physical) Disk: Average Disk Queue or Current Disk Queue and ensure the disk queue is at a level less than 2 for each physical drive associated with any SQL Server activity. For database servers that employ hardware RAID controllers and disk arrays, divide the number reported by (logical or physical) disk counters by the number of actual hard disk drives associated with that logical drive letter or physical hard disk drive number reported by the Windows NT® Disk Administrator program. Windows and SQL Server are unaware of the actual number of physical hard disk drives attached to a RAID controller. You should know the number of drives associated with RAID array controller to interpret the disk queue numbers Performance Monitor reports.
Adjust lazy writer disk I/O request behavior with the max async IO option, which controls the number of 8-KB disk write requests that SQL Server (including requests coming in from lazy writer, checkpoint, and the worker threads) can simultaneously submit to the Windows operating system and in turn, to the disk I/O subsystem. If disk queuing occurs at unacceptable levels, decrease the level of the max async IO option. If the currently configured level of the max async IO option must be maintained, add more disks to the disk subsystem until disk queuing reaches acceptable levels.
For more information, see these topics in SQL Server Books Online: Freeing and Writing Buffer Pages and Write-Ahead Transaction Log.
Checkpoint
Checkpoint writes out dirty pages to the SQL Server data files. Dirty pages are any buffer cache pages that have been modified since being brought into the buffer cache. A buffer written to disk by checkpoint still contains the page, and users can read or update it without rereading it from disk, which is not the case for free buffers created by lazy writer.
Checkpoint allows worker threads and lazy writer do the majority of the work writing out dirty pages by waiting an extra checkpoint before writing out a dirty page. This provides the worker threads and lazy writer more time to write out the dirty pages. The conditions under which this extra wait time occurs is detailed in your SQL Server documentation. Checkpoint evens out SQL Server disk I/O activity over a longer time period with an extra checkpoint wait.
To make checkpoint more efficient when there are many pages to flush out of cache, SQL Server sorts the data pages to be flushed in the order the pages appear on disk. This sorting helps minimize disk arm movement during cache flush and potentially allows checkpoint to take advantage of sequential disk I/O. Checkpoint also submits 8-KB disk I/O requests asynchronously to the disk subsystem. This allows SQL Server to finish submitting required disk I/O requests faster because checkpoint does not wait for the disk subsystem to report back that the data actually has been written to disk.
You should watch disk queuing on hard disk drives associated with SQL Server data files to notice if SQL Server is sending down more disk I/O requests than the disk(s) can handle. If it is, then more disk I/O capacity must be added to the disk subsystem to handle the load.
Adjust the checkpoint dirty page flushing behavior by using the max async IO option. The sp_configure option max async IO controls the number of 8-KB cache flushes that checkpoint can submit simultaneously to the Windows operating system and in turn, to the disk I/O subsystem. If disk queuing occurs at unacceptable levels, decrease the max async IO option. If SQL Server must maintain the currently configured level of the max async IO option, add more disks to the disk subsystem until disk queuing decreases to acceptable levels.
If you must increase the speed with which SQL Server executes checkpoint and the disk subsystem is powerful enough to handle the increased disk I/O while avoiding disk queuing, then increase the max async IO option to allow SQL Server to send more asynchronous disk I/O requests. Observe the disk queuing counters carefully after you change the max async IO option. Be sure to watch disk read queuing in addition to disk write queuing. If the max async IO option is set too high for a disk subsystem, checkpoint may queue up many disk write I/O requests, which can cause SQL Server read activity to be blocked. Physical disk and logical disk objects in Performance Monitor provide the Average Disk Read Queue Length counter, which can be used to monitor queued disk read I/O requests. If disk read queuing is caused by checkpoint, you can either decrease the max async IO option or add more hard disk drives so the checkpoint and read requests can be handled simultaneously.
Log Manager
Like other major RDBMS products, SQL Server ensures that all write activity (inserts, updates, and deletes) performed on the database is not lost if something interrupts the SQL Server online status (for example, power failure, disk drive failure, fire in the data center, and so on). The SQL Server logging process helps guarantee recoverability. Before any implicit (single Transact-SQL query) or explicit (transaction that issues BEGIN TRANSACTION, COMMIT or ROLLBACK statements) transactions can be completed, the SQL Server log manager must receive a signal from the disk subsystem that all data changes associated with the transaction have been written successfully to the associated log file. This rule guarantees the transaction log can be read and reapplied in SQL Server when the server is turned on after an abrupt shut down during which the transactions written into the data cache are not yet flushed to the data files. Flushing data buffers are checkpoint or lazy writer responsibility. Reading the transaction log and applying the transactions to SQL Server after a server stoppage is referred to as recovery.
Because SQL Server must wait for the disk subsystem to complete I/O to SQL Server log files as each transaction is completed, disks containing SQL Server log files must have sufficient disk I/O handling capacity for the anticipated transaction load.
The method for monitoring disk queuing is different for SQL Server log files than it is for SQL Server database files. You can use the Performance Monitor counters SQL Server: Databases database instance : Log Flush Waits Times and SQL Server: Databases database instance : Log Flush Waits/sec to view log writer requests waiting on the disk subsystem for completion.
For highest performance, you can use a caching controller for SQL Server log files if the controller guarantees that data entrusted to it is written to disk eventually, even if the power fails. For more information about caching controllers, see “Effect of On-Board Cache of Hardware RAID Controllers” later in this document.
For more information, see these topics and/or keywords in SQL Server Books Online: Transaction Recovery, Optimizing Transaction Log Performance, and log manager object.
Read Ahead Manager
The Microsoft SQL Server Read Ahead Manager is completely self-configuring and self-tuning. Read Ahead Manager is tightly integrated with the operations of SQL Server query processor. SQL Server Query Processor identifies and communicates situations that will benefit from read-ahead scans to the Read Ahead Manager. Large table scans, large index range scans and probes into clustered and nonclustered index B-trees are situations that would benefit from a read-ahead. Read-ahead reads occur with 64-KB I/Os, which provide higher disk throughput potential for the disk subsystem than 8-KB I/Os do. When it is necessary to retrieve a large amount of data from SQL Server, read-ahead is the best way to do it.
Read Ahead Manager benefits from the simpler and more efficient Index Allocation Map (IAM) storage structure. The IAM is the SQL Server method of recording the location of extents (eight pages of SQL Server data or index information for a total of 64 KB of information per extent). The IAM is an 8-KB page that tightly packs information through a bitmap about which extents within the range of extents covered by the IAM contain required data. The compact IAM pages are fast to read and tend to keep regularly used IAM pages in buffer cache.
The Read Ahead Manager can construct multiple sequential read requests by combining the query information from query processor and quickly retrieving the location of all extents that must be read from the IAM page(s). Sequential 64-KB disk reads provide excellent disk I/O performance.
Read-ahead activity is monitored by the SQL Server: Buffer Manager - Readahead Pages counter. You can find more information about read-ahead activity by executing the DBCC PERFMON (IOSTATS) statement. Some of the information provided is RA Pages Found in Cache and RA Pages Placed in Cache. If the page is already hashed (the application read it in first and read-ahead wasted a read), it is a page found in cache. If the page is not already hashed (a successful read-ahead), it is a page placed in cache.
Too much read-ahead can be detrimental to overall performance because it can fill cache with unnecessary pages, requiring additional I/O and CPU that could have been used for other purposes. The solution is a performance tuning goal that all Transact-SQL queries are tuned so a minimal number of pages are brought into buffer cache. This includes using the right index for the right job. Save clustered indexes for efficient range scans and define nonclustered indexes to help locate single rows or smaller rowsets quickly.
For more information, see these topics and/or keywords in SQL Server Books Online: Reading Pages, Table and Index Architecture, Heap Structures, DBCC PERFMON, and read-ahead pages.
Disk I/O Performance
When you configure a SQL Server that contains only a few gigabytes of data and does not sustain heavy read or write activity, you are not as concerned with disk I/O and balancing SQL Server I/O activity across hard disk drives for maximum performance. If you build larger SQL Server databases, however, that contain hundreds of gigabytes of data and/or will sustain heavy read and/or write activity, you must configure SQL Server to maximize disk I/O performance by load-balancing across multiple hard disk drives.
Advertised Disk Transfer Rates and SQL Server
An important aspect of database performance tuning is I/O performance tuning. Unless SQL Server is running on a computer with enough RAM to hold the entire database, I/O performance is dictated by how fast reads and writes of SQL Server data can be processed by the disk I/O subsystem.
The typical hard disk drive is capable of providing the Windows operating system and SQL Server with about 75 nonsequential (random) and 150 sequential I/O operations per second. Advertised transfer rates for these hard disk drives are around 40 megabytes (MB) per second. It is much more likely for a database server to be constrained by the 75/150 I/O transfers per second than the 40-MB per second transfer rate. This is illustrated by these calculations:
(75 random I/O operations per second) X (8-KB transfer) = 600 KB per second
This calculation indicates that by doing strictly random single page read and write operations on a hard disk drive, you can expect at most 600 KB (0.6 MB) per second I/O processing capability from that hard disk drive. This is much lower than the advertised 40 MB per second I/O handling capacity of the drive. SQL Server worker threads, checkpoint, and lazy writer perform I/O in 8-KB transfer sizes.
(150 sequential I/O operations per second) X (8-KB transfer) = 1200 KB per second
This calculation indicates by doing strictly sequential single-page read and write operations on a hard disk drive, it is reasonable to expect at most 1200 KB (1.2 MB) per second I/O processing capability from the hard disk drive.
(75 random I/O operations per second) X (64-KB transfer) = 4800 KB (4.8 MB) per second
This calculation illustrates a worse-case scenario for read-aheads, assuming all random I/O. Even in the completely random situation, the 64-KB transfer size provides much better disk I/O transfer rate from disk (4.8 MB per second) than the single-page transfer rates (0.6 and 1.2 MB per second):
(150 sequential I/O operations per second) X (64-KB transfer) = 9600 KB (9.6 MB) per second
This calculation indicates that by doing strictly sequential read or write operations on a hard disk drive, you can expect at most 9.6 MB per second I/O processing capability from that hard disk drive. This is much better than the random I/O case. SQL Server Read Ahead Manager performs disk I/O in the 64-KB transfer rate and attempts to arrange reads so read-ahead scans are done sequentially (often referred to as serially or in disk order). Although Read Ahead Manager performs I/O operations sequentially, page splitting tends to cause extents to be read nonsequentially rather than sequentially. This is one reason to eliminate and prevent page splitting.
Log manager writes sequentially to log files up to 32 KB.
Sequential vs. Nonsequential Disk I/O Operations
The terms sequential and nonsequential (random) have been used to refer to hard disk drive operations. A single hard disk drive consists of a set of drive platters, each of which provides services for read and write operations with a set of arms with read and write heads that can move across the platters and read information from the drive platter or write data onto the platters. Remember these points about hard disk drives and SQL Server:
• The read/write heads and associated disk arms must move to find and operate on the location of the hard disk drive platter that SQL Server and the Windows operating system requested. If the data is located on nonsequential locations on the hard disk drive platter, it takes significantly more time for the hard disk drive to move the disk arm and read/write head to all of the necessary hard disk drive platter locations. This contrasts with the sequential case, in which all of the required data is located on one continuous physical section of the hard disk drive platter, so that the disk arm and read/write heads move a minimal amount to perform the necessary disk I/O operations. The time difference between the nonsequential versus sequential case is significant, about 50 milliseconds per nonsequential seek versus approximately 2-3 milliseconds for sequential seeks. These times are rough estimates and will vary based on how far apart the nonsequential data is spread around on the disk, how fast the hard disk platters can spin (RPM), and other physical attributes of the hard disk drive. The main point is that sequential I/O is good for SQL Server performance.
• A typical hard disk drive supports about 75 nonsequential and 150 sequential I/Os per second. It takes almost as much time to read or write 8 KB as it does to read or write 64 KB. Within the range of 8 KB to about 64 KB the disk arm and read/write head movement account for most of the time of a single disk I/O transfer operation. Perform 64-KB disk transfers as often as possible when more than 64 KB of SQL data must be transferred, because a 64-KB transfer is fast as an 8-KB transfer and eight times the amount of SQL Server data is processed for each transfer. Read Ahead Manager does disk operations in 64-KB chunks (referred to as a SQL Server extent). Log manager also performs sequential writes in larger I/O sizes. By making good use of Read Ahead Manager and separating SQL Server log files from other nonsequentially accessed files you can improve SQL Server performance.
For more information about physical hard disk drives, see the Compaq document, “Disk Subsystem Performance and Scalability.” For the location of this document, see “Finding More Information,” later in this document.
Disk I/O Transfer Rates and PCI Bus Bandwidth
A typical hard disk provides a maximum transfer rate of about 40 MB per second or 75 nonsequential/150 sequential disk transfers per second. Typical RAID controllers have an advertised transfer rate of about 40 MB per second or approximately 2000 disk transfers per second. PCI buses have an advertised transfer rate of about 133 MB per second and higher. The actual transfer rates achievable for a device usually differ from the advertised rate. You should understand how to use these transfer rates as a starting point for determining the number of hard disk drives to associate with each RAID controller and in turn, how many drives and RAID controllers can be attached to a PCI bus without I/O bottlenecks.
Earlier, we calculated that the maximum amount of SQL Server data that can be read from or written to a hard disk drive per second is 9.6 MB. Assuming a RAID controller can handle 40 MB per second, you can calculate the number of hard disk drives that should be associated with one RAID controller by dividing 40 by 9.6 to get about 4. This means that at most four drives should be associated with that one controller when SQL Server is doing nothing but sequential I/O of 64 KB. Similarly, we calculated that with all nonsequential I/O of 64 KB, the maximum data sent from the hard disk drive to the controller is 4.8 MB per second. Dividing 40 MB per second by 4.8 MB per second gives us the result of about 8. This means that at most eight hard disk drives should be associated with the single controller in the nonsequential 64-KB scenario. The random 8-KB data transfer scenario requires the most drives. Divide 40 by 0.6 to determine that about 66 drives are needed to saturate a RAID controller doing 100 percent random 8-KB reads and writes. This is not realistic scenario because read-ahead and log-writing use transfer sizes greater than 8-KB and it is unlikely that a SQL Server will perform 100 percent random I/O.
You can also determine how many drives should be associated with a RAID controller by looking at disk transfers per second instead of looking at the megabytes per second. If a hard disk drive is capable of 75 nonsequential (random) I/Os per second, then about 26 hard disk drives working together could theoretically produce 2000 nonsequential I/Os per second, or enough to hit the maximum I/O handling capacity of a single RAID controller. Alternately, it only takes about 13 hard disk drives working together to produce 2000 sequential I/Os per second and keep the RAID controller running at maximum throughput, since a single hard disk drive can sustain 150 sequential I/Os per second.
RAID controller and PCI bus bottlenecks are not nearly as common as I/O bottlenecks related to hard disk drives. To illustrate, assume a set of hard disk drives associated with a RAID controller is busy enough to push 40 MB per second of throughput through the controller. The next consideration is how many RAID controllers can be safely attached to the PCI bus without risking a PCI bus I/O bottleneck. To make a rough estimate, divide the I/O processing capacity of the PCI bus by the I/O processing capacity of the RAID controller: 133 MB/sec divided by 40 MB/sec results in approximately three RAID controllers that can be attached to a single PCI bus. Most large servers come with more than one PCI bus, which increases the number of RAID controllers that can be installed in a single server.
These calculations illustrate the relationship of the transfer rates of the components that comprise a disk I/O subsystem (hard disk drives, RAID controllers, PCI bus) and are not literal figures. These calculations assume all sequential or all nonsequential data access, which is not likely in a production database server environment. In reality, a mixture of sequential, nonsequential, 8-KB and 64-KB I/O occurs. Additional factors influence how many I/O operations can be pushed through a set of hard disk drives at one time. On-board read/write caching available for RAID controllers increases the amount of I/O that a set of drives can effectively produce. How much more is difficult to estimate for the same reason that it is difficult to determine an exact number an 8-KB versus a 64-KB I/O SQL Server environment needs.
RAID
When scaling databases more than a few gigabytes (GB) it is important to have at least a basic understanding of RAID (Redundant Array of Inexpensive Disks) technology and how it relates to database performance.
These are the benefits of RAID:
• Performance
Hardware RAID controllers divide read/writes of all data from Windows and applications such as Microsoft SQL Server into slices (usually 16 KB - 128 KB) that are spread across all disks participating in the RAID array. Splitting data across physical drives distributes the read/write I/O workload evenly across all physical hard disk drives participating in the RAID array. This increases disk I/O performance because the hard disks participating in the RAID array are all kept equally busy, instead of some disks becoming a bottleneck due to uneven distribution of I/O requests.
• Fault Tolerance
RAID provides protection from hard disk failure and accompanying data loss with two methods: mirroring and parity.
Mirroring is implemented by writing information onto two sets of drives, one on each side of the mirrored pairs of drives. If there is a drive loss with mirroring in place, the data for the lost drive can be rebuilt by replacing the failed drive and rebuilding the data from the failed drive’s matching drive. Most RAID controllers can provide a failed drive replacement and rebuild from the other side of the mirrored pair while Windows and SQL Server are online (referred to as “Hot Plug” capable drives). Mirroring is the best performing RAID option when fault tolerance is required. Each SQL Server write in the mirroring situation costs two disk I/O operations, once to each side of the mirrorset. Mirroring also provides more fault tolerance than parity RAID implementations. Mirroring can sustain at least one failed drive and may be able to survive failure of up to half of the drives in the mirrorset without forcing the system administrator to shut down the server and recover from file backup. The disadvantage of mirroring is cost. The disk cost of mirroring is one drive for each drive of data. RAID 1 and its hybrid, RAID 0+1 are implemented through mirroring.
Parity is implemented by calculating recovery information about data written to disk and writing this parity information on the other drives that form the RAID array. If a drive fails, a new drive is inserted into the RAID array and the data on that failed drive is recovered by taking the recovery information (parity) written on the other drives and using this information to regenerate the data from the failed drive. RAID 5 and its hybrids are implemented through parity. The advantage of parity is low cost. To protect any number of drives with RAID 5, only one additional drive is required. Parity information is evenly distributed among all drives participating in the RAID 5 array. The disadvantages of parity are performance and fault tolerance. Due to the additional costs associated with calculating and writing parity, RAID 5 requires four disk I/O operations for each Windows NT and SQL Server write as compared to two disk I/O operations for mirroring. Read I/O operation costs are the same for mirroring and parity. Also, RAID 5 can sustain only one failed drive before the array must be taken offline and recovery from backup media must be performed to restore data.
A general rule is to stripe across as many disks as necessary to achieve solid disk I/O performance. Performance Monitor indicates whether there is a disk I/O bottleneck on a particular RAID array. Be ready to add disks and redistribute data across RAID arrays and/or SCSI channels as necessary to balance disk I/O and maximize performance.
Effect of On-board Cache of Hardware RAID Controllers
Many hardware RAID controllers have some form of read and/or write caching. Take advantage of this available caching with SQL Server because it can enhance the effective I/O handling capacity of the disk subsystem. The principle of these controller-based caching mechanisms is to gather smaller and potentially nonsequential I/O requests coming in from the host server (hence SQL Server) and try to batch them with other I/O requests so the batched I/Os can form larger (32 - 128 KB) and maybe sequential I/O requests to send to the hard disk drives. This helps produce more disk I/O throughput given the fixed number of I/Os that hard disks can provide to the RAID controller. The RAID controller cache arranges incoming I/O requests by making the best use of the hard disks underlying I/O processing ability.
RAID controllers usually protect their caching mechanisms with a form of backup power. The backup power can preserve the data written in cache for a period of time (perhaps days) in case of a power outage. And in production environments, the backup power can provide the database server greater protection by providing adequate UPS protection to the server to flush data to disk if power to the server is disrupted.
RAID Levels
RAID 1 and RAID 0+1 offer the best data protections and best performance among RAID levels but costs more required disks. When cost of hard disks is not a limiting factor, RAID 1 or RAID 0+1 are the best RAID choices for performance and fault tolerance.
RAID 5 provides fault tolerance at the best cost but has half the write performance of RAID 1 and 0+1 because of the additional I/O that RAID 5 has to do reading and writing parity information onto disk. RAID 5 is not as fault tolerant as RAID 1 and 0+1.
The best disk I/O performance is achieved with RAID 0 (disk striping with no fault tolerance protection), but because there is no fault tolerance with RAID 0, this RAID level typically can be used only for development database servers or other testing environments.
Many RAID array controllers provide the option of RAID 0+1 (also referred to as RAID 1/0 and RAID 10) over physical hard disk drives. RAID 0+1 is a hybrid RAID solution. On the lower level, it mirrors all data, like normal RAID 1. On the upper level, the controller stripes data across all of the drives (like RAID 0). Thus, RAID 0+1 provides maximum protection (mirroring) with high performance (striping). These mirroring and striping operations are transparent to Windows NT and SQL Server because they are managed by the RAID controller. The difference between RAID 1 and RAID 0+1 is on the hardware controller level. RAID 1 and RAID 0+1 require the same number of drives for a given amount of storage. For more specifics about RAID 0+1 implementation of specific RAID controllers, contact the hardware vendor that produced the controller.
This illustration shows the differences between RAID 0, RAID 1, RAID 5 and RAID 0+1. To hold four disks of data, RAID 1 (and RAID 0+1) needs eight disks, whereas Raid 5 needs five disks. Be sure to involve the appropriate hardware vendors to learn more about RAID implementation specific to the hardware running the database server.
Common RAID Levels

Online RAID Expansion
Online RAID expansion is a feature that allows disks to be added dynamically to a physical RAID array while SQL Server is online, as long as there are hot-plug slots available. Many hardware vendors offer hardware RAID controllers capable of providing this functionality. Data is automatically restriped across all drives evenly, including the newly added drive, and there is no need to shut down SQL Server or Windows. You can take advantage of this functionality by leaving hot-plug hard disk drive slots free in the disk array cages. Thus, if SQL Server is regularly over-taxing a RAID array with I/O requests (this is indicated by disk queue length for the Windows logical drive letter associated with that RAID array), you can install one or more new hard disk drives into the hot-plug slot while SQL Server is still running. The RAID controller redistributes some existing SQL data to these new drives so that SQL data is evenly distributed across all drives in the RAID array. Then, the I/O processing capacity of the new drives (75 nonsequential/150 sequential I/Os per second, per drive) is added to the overall I/O processing capacity of the RAID array.
Performance Monitor and RAID
In Performance Monitor, logical and physical disk objects provide the same information. The difference is that logical disks in Performance Monitor are associated with what Windows NT interprets as a logical drive letter. Physical disks in Performance Monitor are associated with what Windows NT interprets as a single physical hard disk.
To enable Performance Monitor counters, use the command diskperf.exe from the command line of the command-prompt window. Use diskperf –y so that Performance Monitor reports logical and physical disk counters when using hard disk drives or sets of hard disk drives and RAID controllers, without the use of Windows NT software RAID.
When running Windows NT software RAID, use diskperf –ye so that that Performance Monitor correctly reports physical counters across the Windows NT stripesets. When
diskperf –ye is used in conjunction with Windows NT stripesets, logical counters do not report correct information and must be disregarded. If logical disk counter information is required in conjunction with Windows NT stripesets, use diskperf –y instead. With diskperf y and Windows NT stripesets, logical disk counters are reported correctly but physical disk counters do not report correct information and should be disregarded.
The effects of the diskperf -y command do not occur until Windows NT has been restarted.
Hardware RAID controllers present multiple physical hard disk drives that compose a single RAID mirrorset or stripeset to the Windows operating system, as one single physical. Disk Administrator is used to associate logical drive letters to the single physical disk and does not need to know how many hard disks are actually associated with the single hard physical that the RAID controller has presented to it.
You should know how many physical hard disk drives are associated with a RAID array so you can determine the number of disk I/O requests that the Windows operating system and SQL Server send to each physical hard disk drive. Divide the number of disk I/O requests that Performance Monitor reports as associated with a hard disk drive by the number of actual physical hard disk drives known to be in the RAID array.
To estimate I/O activity per hard disk drive in a RAID array, multiply the number of disk write I/Os reported by Performance Monitor by either 2 (RAID 1 and 0+1) or 4 (RAID 5). This accurately accounts for the number of I/O requests being sent to the physical hard disk drives. It is at this physical level that the I/O capacity for hard disk drives apply (75 nonsequential and 150 sequential per drive). But do not expect to calculate exactly how much I/O is hitting the hard disk drives when the hardware RAID controller is using caching, because caching can change the amount of I/O that is hitting the hard disk drives.
It is best to monitor on disk queuing unless I/O is causing a problem. The Windows operating system cannot see the number for physical drives in a RAID array, so to assess disk queuing per physical disk accurately, you must divide the disk queue length by the number of physical drives participating in the hardware RAID disk array that contains the logical drive being observed. Keep this number under two for hard disk drives containing SQL Server files.
For more information about SQL Server and RAID, see these topics in SQL Server Books Online: RAID Levels and SQL Server, Comparing Different Implementations of RAID Levels, Monitoring Disk Activity, Performance Monitoring Example: Identifying Bottlenecks, About Hardware-based Solutions, and RAID.
Windows NT Software RAID
Windows NT provides fault tolerance to hard disk failure by providing mirrorsets and stripesets (with or without fault tolerance) through the Windows NT operating system instead of through a hardware RAID controller. Windows NT Disk Administrator is used to define either mirrorsets (RAID 1) or stripesets with parity (RAID 5). Windows NT Disk Administrator also allows the definition of stripesets with no fault tolerance (RAID 0).
Software RAID utilizes more CPU resources because Windows NT is the component managing the RAID operations versus the hardware RAID controller. Thus, performance with the same number of disk drives and Windows NT software RAID may be a few percent less than the hardware RAID solution if the system processors are near 100 percent utilized. But Windows NT software RAID generally helps a set of drives service SQL Server I/O better overall than those drives would have been able to separately, reducing the potential for an I/O bottleneck, leading to higher CPU utilization by SQL Server and better throughput. Windows NT software RAID can provide a better-cost solution for providing fault tolerance to a set of hard disk drives.
For more information about configuring Windows NT software RAID, see your Windows NT Server Online Help. Also see these topics in SQL Server Books Online: About Windows NT-based Disk Mirroring and Duplexing and About Windows NT-based Disk Striping and Striping with Parity.
Disk I/O Parallelism
With smaller SQL Server databases located on a few disk drives, disk I/O parallelism is not a likely performance factor. But with large SQL Server databases stored on many disk drives, performance is enhanced by using disk I/O parallelism to make optimal use of the I/O processing power of the disk subsystem.
Microsoft SQL Server introduces files and filegroups, which replace the device and segment model from earlier versions of SQL Server. The files and filegroups provide a more convenient method for spreading data proportionately across disk drives or RAID arrays. For more information, see these topics in SQL Server Books Online: Placing Indexes on Filegroups, Placing Tables on Filegroups, Files and Filegroups, and Using Files and Filegroups to Manage Database Growth.
A technique for creating disk I/O parallelism is to create a single “pool of drives” that serves all SQL Server database files, excluding transaction log files. The pool can be a single RAID array that is represented in Windows NT as a single physical disk drive. Or a larger pool can be set up using multiple RAID arrays and SQL Server files/filegroups. A SQL Server file can be associated with each RAID array and the files can be combined into a SQL Server filegroup. Then a database can be built on the filegroup so that the data is spread evenly across all of the drives and RAID controllers. The drive pool methodology depends on RAID to divide data across all physical disk drives to help ensure parallel access to the data during database server operations.
The pool methodology simplifies SQL Server I/O performance tuning because database administrators know there is only one physical location to create database objects. The single pool of drives can be watched for disk queuing and, if necessary, more hard disk drives can be added to the pool to prevent disk queuing. This technique helps optimize for the common case, where it is not known which parts of databases will see the most use. Do not segregate part of available I/O capacity on another disk partition because SQL Server might do I/O to it 5 percent of the time. The single pool of drives methodology can make all available I/O capacity available for SQL Server operations.
SQL Server log files always should be physically separated onto different hard disk drives from all other SQL Server database files. For SQL Servers with busy databases, transaction log files should be physically separated from each other. Transaction logging is primarily sequential write I/O. Separating transaction logging activity from other nonsequential disk I/O activity can result in I/O performance benefits. That allows the hard disk drives containing the log files to concentrate on sequential I/O. There are times when the transaction log must be read as part of SQL Server operations such as replication, rollbacks, and deferred updates. Administrators of SQL Servers that participate in replication should make sure all transaction log files have sufficient disk I/O processing power because of the reads that need to occur.
Physically separating SQL Server objects from the rest of their associated database through SQL Server files and filegroups requires additional administration. Separating the objects can be worthwhile to investigate active tables and indexes. By separating table or index from all other database objects, accurate assessments can be made of the object I/O requirements. This is not as easy to do when all database objects are placed within one drive pool. Physical I/O separation can be appropriate during database development and benchmarking so that database I/O information can be gathered and applied to capacity planning for the production database server environment.
These are the areas of SQL Server activity that can be separated across different hard disk drives, RAID controllers, PCI channels, or combinations of the three:
• Transaction log files
• tempdb
• Database files
• Tables associated with considerable query or write activity
• Nonclustered indexes associated with considerable query or write activity
The physical separation of SQL Server I/O activities is made convenient by using hardware RAID controllers, RAID hot plug drives, and online RAID expansion. The approach that provides the most flexibility is arranging the RAID controllers so that a separate RAID SCSI channel is provided for each database activity. Each RAID SCSI channel should be attached to a separate RAID hot plug cabinet to take full advantage of online RAID expansion (if it is available through the RAID controller). Windows logical drive letters are associated with each RAID array and SQL Server files can be separated between distinct RAID arrays based on known I/O usage patterns.
With this configuration you can relate disk queuing to a distinct RAID SCSI channel and its drive cabinet as Performance Monitor reports the queuing behavior during load testing or heavy production loads. If a RAID controller and drive array cabinet support online RAID expansion and slots for hot-plug hard disk drives are available in the cabinet, disk queuing on that RAID array is resolved by adding more drives to the RAID array until Performance Monitor reports that disk queuing for that RAID array has reached acceptable levels (less than 2 for SQL Server files.) This can be done while SQL Server is online.
The tempdb database is created by SQL Server to be a shared working area for a variety of activities, including temporary tables, sorting, subqueries and aggregates with GROUP BY or ORDER BY, queries using DISTINCT (temporary worktables must be created to remove duplicate rows), cursors, and hash joins. You should enable the tempdb database I/O operations to occur in parallel to the I/O operations of related transactions. Because tempdb is a scratch area and update-intensive, RAID 5 is not as good a choice for tempdb as RAID 1 or 0+1. The tempdb database is rebuilt every time the database server is restarted, so RAID 0 is a possibility for tempdb on production SQL server computers. RAID 0 provides the best RAID performance for the tempdb database with the least physical drives. The concern with using RAID 0 for tempdb in a production environment is that SQL Server must be stopped and restarted if physical drive failure occurs in the RAID 0 array and this does not necessarily occur if tempdb is placed on a RAID 1 or 0+1 array.
To move the tempdb database, use the ALTER DATABASE command to change the physical file location of the SQL Server logical file name associated with the tempdb database. For example, to move the tempdb database and its associated log to the new file locations E:\Mssql and C:\Temp, use these commands:
alter database tempdb modify file (name='tempdev',filename= 'e:\mssql\tempnew_location.mDF')
alter database tempdb modify file (name='templog',filename= 'c:\temp\tempnew_loglocation.LDF')
The master, msdb, and model databases are not used much during production compared to user databases, so thay are not typically a consideration in I/O performance tuning. The master database is used only for adding new logins, databases, and other system objects.
Nonclustered indexes reside in B-tree structures that can be separated from their related database tables with the ALTER DATABASE statement. In this example, the first ALTER DATABASE creates a filegroup. The second ALTER DATABASE creates a file with a separate physical location associated with the filegroup. At this point, indexes can be created on the filegroup as illustrated by creating the index called index1. The sp_helpfile stored procedure reports files and filegroups present for a given database. The sp_help tablename has a section in its output that provides information about the table indexes and filegroup relationships.
alter database testdb add filegroup testgroup1
alter database testdb add file (name = 'testfile',
filename = 'e:\mssql\test1.ndf') to filegroup testgroup1
create table test1(col1 char(8))
create index index1 on test1(col1) on testgroup1

sp_helpfile
sp_help test1
For more information, see these topics and/or keywords in SQL Server Books Online: ALTER DATABASE, sp_helpfile, Files and Filegroups, Placing Indexes on Filegroups, Monitoring Disk Activity, Physical Database Files and Filegroups, adding and deleting data, and transaction log files.
SQL Server Indexes
This section contains information about how SQL Server data and index structures are physically placed on disk drives and how these structures apply to disk I/O performance.
SQL Server data and index pages are each 8 KB. SQL Server data pages contain all of the data associated with rows of a table, except text and image data. For text and image data, the SQL Server data page, which contains the row associated with the text or image column, contains a pointer to a B-tree structure of one or more 8-KB pages.
SQL Server index pages contain only data from columns that comprise a particular index, therefore index pages compress information associated with many more rows into an 8-KB page than does an 8-KB data page. The I/O performance benefit of indexes is a result of this information compression. If the columns picked to be part of an index form a low percentage of the rowsize of the table then information about more rows can be compressed in an 8-KB index page, which has performance benefits. When a SQL query requests a set of rows from a table in which columns in the query match values in the rows, SQL Server can save I/O operations by reading the index pages for the values and then accessing only the rows in the table required to satisfy the query. Then SQL Server does not have to perform I/O operations to scan all rows in the table to locate the required rows.
SQL Server indexes are built upon B-tree structures formed out of 8-KB index pages. The difference between clustered and nonclustered indexes is at the bottom of the B-tree structures, (referred to as leaf level). The upper parts of index B-tree structures are referred to as nonleaf levels of the index. A B-tree structure is built for every single index defined on a SQL Server table.
The illustration shows the structural difference between nonclustered and clustered indexes. In a nonclustered index, the leaf-level nodes contain only the data that participates in the index. In the nonclusterered index leaf-level nodes, index rows contain pointers to the remaining row data on the associated data page. At worst, each row access from the nonclustered index requires an additional nonsequential disk I/O to retrieve the row data. At best, many of the required rows will be on the same data page and thus allow retrieval of several required rows with each data page fetched. In the clustered index, the leaf-level nodes of the index are the actual data rows for the table. Therefore, no bookmark lookups are required for retrieval of table data. Range scans based on clustered indexes perform well because the leaf level of the clustered index (and hence all rows of that table) is physically ordered on disk by the columns that comprise the clustered index and will perform I/O in 64-KB extents. These 64-KB I/Os are physically sequential if there is not a lot of page splitting on the clustered index B-tree (nonleaf and leaf levels). The dotted lines indicate there are other 8-KB pages present in the B-tree structures but they are not shown.

Clustered Indexes
There can be only one clustered index per table because, while the upper parts of the clustered index B-tree structure are organized like the nonclustered index B-tree structures, the bottom level of the clustered index B-tree consists of the actual 8-KB data pages associated with the table. There are performance implications:
• Retrieval of SQL data based on key search with a clustered index requires no bookmark lookup (and a likely nonsequential change of location on the hard disk) to get to the associated data page, because the leaf level of the clustered index is already the associated data page.
• The leaf level of the clustered index is sorted by the columns that comprise the clustered index. Because the leaf level of the clustered index contains the actual 8-KB data pages of the table, the row data of the entire table is physically arranged on the disk drive in the order determined by the clustered index. This provides a potential I/O performance advantage when fetching a significant number of rows from tables greater than 64-KB based on the value of the clustered index, because sequential disk I/O is being used unless page splitting is occuring on this table. For more information about page-splitting, see “FILLFACTOR and PAD_INDEX” later in this document. You should pick the clustered index on a table based on a column that is used to perform range scans to retrieve a large number of rows.
Nonclustered Indexes
Nonclustered indexes are most useful for fetching few rows with good selectivity from large SQL Server tables based on a key value. Nonclustered indexes are B-trees formed out of 8-KB index pages. The bottom or leaf level of the B-tree of index pages contains all the data from the columns that comprised that index. When a nonclustered index is used to retrieve information from a table based on a match with the key value, the index B-tree is traversed until a key match is found at the leaf level of the index. A bookmark lookup is made if columns from the table are needed that did not form part of the index. This bookmark lookup will likely require a nonsequential I/O operation on the disk. It might even require the data to be read from another disk if the table and its accompanying index B-tree(s) are large. If multiple bookmark lookup lead to the same 8-KB data page, then there is less of an I/O performance penalty since it is only necessary to read the page into data cache once. For each row returned for a SQL query that involves searching with a nonclustered index, one bookmark lookup is required. These bookmark lookups are the reason that nonclustered indexes are better suited for SQL queries that return only one or a few rows from the table. Queries that require many rows to be returned are better served with a clustered index.
For more information, see this keyword in SQL Server Books Online: nonclustered index.
Covering Indexes
A special situation that occurs with nonclustered indexes is called the covering index. A covering index is a nonclustered index built upon all of the columns required to satisfy a SQL query, both in the selection criteria and in the WHERE clause. Covering indexes can save I/O and improve query performance. But you must balance the costs of creating a new index (with its associated B-tree index structure maintenance) with the I/O performance gain the covering index will bring. If a covering index will benefit a query or a set of queries that run often on SQL Server, then creating the covering index may be worthwhile.
Example:
SELECT col1,col3 FROM table1 WHERE col2 = 'value'
CREATE INDEX indexname1 ON table1(col2,col1,col3)
or
From SQL Server Enterprise Manager, use the Create Index Wizard.
The indexname1 index in this example is a covering index because it includes all columns from the SELECT statement and the WHERE clause. During the execution of this query, SQL Server does not need to access the data pages associated with table1. SQL Server can obtain all of the information required to satisfy the query by using the index called indexname1. When SQL Server has traversed the B-tree associated with indexname1 and has found the range of index keys where col2 is equal to value, SQL Server fetches all of required data (col1,col2,col3) from the leaf level of the covering index. This provides I/O performance in two ways:
• SQL Server obtains all required data from an index page, not a data page, so the data is more compressed and SQL Server saves disk I/O operations.
• The covering index organizes all of the required data by col2 physically on the disk. The hard disk drives return all of the index rows associated with the WHERE clause (col2 = value) in sequential order, which gives better I/O performance. From a disk I/O standpoint a covering index becomes a clustered index for this query and any other query that can be satisfied completely by the columns in the covering index.
If the number of bytes from all the columns in the index is small compared to the number of bytes in a single row of that table, and you are certain the query taking advantage of the covered index will be executed frequently, then it may make sense to use a covering index. But, before building a lot of covered indexes, consider how SQL Server can effectively and automatically create covered indexes for queries on the fly.
Automatic Covering Indexes or Covered Queries
The Microsoft SQL Server Query Processor provides index intersection. Index intersection allows the query processor to consider multiple indexes from a given table, build a hash table based on those multiple indexes, and utilize the hash table to reduce I/O for a query. The hash table that results from the index intersection becomes a covering index and provides the same I/O performance benefits that covering indexes do. Index intersection provides greater flexibility for database user environments in which it is difficult to determine all of the queries that will run against the database. A good strategy in this case is to define single column, nonclustered indexes on all columns that will be queried frequently and let index intersection handle situations in which a covered index is needed.
For more information, see these topics in SQL Server Books Online: Query Tuning Recommendations and Designing an Index.
Example:
SELECT col3 FROM table1 WHERE col2 = 'value'
CREATE INDEX indexname1 ON table1(col2)
CREATE INDEX indexname2 ON table1(col3)
or
From SQL Server Enterprise Manager, use the Create Index Wizard.
In this example, indexname1 and indexname2 are nonclustered, single column indexes created on the SQL Server table called table1. When the query executes, the query processor recognizes that index intersection using the two indexes is advantgeous. The query optimizer automatically hashes the two indexes together to save I/O while executing the query. No query hints were required. Queries handled by covering indexes (whether by explicitly declared covering indexes or index intersection) are called covered queries.
Index Selection
How indexes are chosen significantly affects the amount of disk I/O generated and, subsequently, performance. Nonclustered indexes are appropriate for retrieval of a few rows and clustered indexes are good for range scans. In addition, you should try to keep indexes compact (few columns and bytes). This is especially true for clustered indexes because nonclustered indexes use the clustered index to locate row data. For more information, see these topics in SQL Server Books Online: Using Clustered Indexes, Index Tuning Recommendations, and Designing an Index.
Consider selectivity for nonclustered indexes because if a nonclustered index is created on a large table with only a few unique values, use of that nonclustered index does not save I/O during data retrieval. In fact, using the index will cause much more I/O than a sequential table scan of the table. Possible candidates for a nonclustered index include invoice numbers, unique customer numbers, social security numbers, and telephone numbers.
Clustered indexes are much better than nonclustered indexes for queries that match columns or search for ranges of columns that do not have many unique values, because the clustered index physically orders the table data and allows for sequential 64-KB I/O on the key values. Possible candidates for a clustered index include states, company branches, date of sale, zip codes, and customer district. Defining a clustered index on the columns that have unique values is not beneficial unless typical queries on the system fetch large sequential ranges of the unique values. To pick the best column on each table to create the clustered index ask if there will be many queries that must fetch many rows based on the order of this column. The answer is very specific to each user environment. One company may do more queries based on ranges of dates whereas another company may do many queries based on ranges of bank branches.
These are examples of WHERE clauses that benefit from clustered indexes:
…WHERE > some_value
…WHERE BETWEEN some_value AND some_value
…WHERE < some_value
Clustered Index Selection
Clustered index selection really involves two major steps. First, determine the column of the table that will benefit most from the clustered index by providing sequential I/O for range scans, and second, use the clustered index to affect the physical placement of table data while avoiding hot spots. A hot spot occurs when data is placed on hard disk drives so many queries try to read or write data in the same area of the disk(s) at the same time. This creates a disk I/O bottleneck because more concurrent disk I/O requests are received by the hard disk than it can handle. The solution is either to stop fetching as much data from this disk or to spread the data across multiple disks to support the I/O demand. This consideration for the physical placement of data can be critical for good concurrent access to data among hundreds or thousands of SQL Server users.
These two decisions often conflict with one another and the best decision is to balance the two. In high user load environments, improved concurrency (by avoiding hot spots) can be more valuable than the performance benefit gained by placing the clustered index on that column.
With SQL Server , nonclustered indexes will use the clustered index to locate data rows if there is a clustered index present on the table. Since all nonclustered indexes need to hold the clustered keys within their B-tree structures, it is better for performance to keep the overall byte size of the clustered index keys as small as possible. Keep the number of columns in the clustered index to a minimum and carefully consider the byte size of each of the columns chosen to be included in a clustered index. This helps reduce the size of the clustered index and subsequently, all nonclustered indexes on a table. Smaller index B-tree structures can be read quicker and help improve performance. For more information, see this topic in SQL Server Books Online: Using Clustered Indexes.
In earlier versions of SQL Server, tables without a clustered index (called heaps) inserted rows are placed at the end of the table on disk. This created the possibility of a hot spot at the end of a busy table. The SQL Server storage management algorithms provide free space management that removes this behavior. When rows are inserted in heaps, SQL Server uses the Page Free Space (PFS) pages to quickly locate available free space in the table in which the row is inserted. PFS pages find free space throughout the table, which recovers deleted space and avoids insertion hot spots. Free space management affects clustered index selection. Because clustered indexes affect physical data placement, hot spots can occur when a clustered index physically sequences based on a column where many concurrent inserts occur at the highest column value and are at the same physical disk location. For columns with monotonically increasing values, a clustered index sequentially orders data rows on disk by that column. By placing the clustered index on another column or by not including a clustered index on the table, this sequential data placement moves to another column or does not occur at all.
Another way to think about hot spots is within the context of selects. If many users select data with key values that are very close to but are not in the same rows, most disk I/O activity will occur within the same physical region of the disk I/O subsystem. This disk I/O activity can be spread out more evenly by defining the clustered index for this table on a column that spreads these key values evenly across the disk. If all selects are using the same unique key value, then using a clustered index does not help balance the disk I/O activity of this table. By using RAID (either hardware or software) you can alleviate this problem by spreading the I/O across many disk drives. This behavior can be described as disk access contention. It is not locking contention.
Clustered Index Selection Scenario
A scenario can illustrate clustered index selection. For example, a table contains an invoice date column, a unique invoice number column, and other data. About 10,000 new records are inserted into this table every day and the SQL queries often search this table for all records for one week of data. Many users have concurrent access to this table. The invoice number is not a candidate for the clustered index. The invoice number is unique and users do not usually search on ranges of invoice numbers, so placing invoice numbers physically in sequential order on disk is not appropriate. Next, the values for invoice number increase monotonically (1001,1002,1003, and so on). If the clustered index is placed on invoice number, inserts of new rows into this table occur at the end of the table beside the highest invoice number on the same physical disk location and create a hot spot.
Consider the invoice date column. To maximize sequential I/O, the invoice date column is a candidate for a clustered index because users often search for one week of data (about 70,000 rows). But from a concurrency perspective, the invoice date column may not be a candidate for the clustered index. If the clustered index is placed on an invoice date, all data tends to be inserted at the end of the table, and a hot spot can occur on the hard disk that holds the end of the table. The insertions at the end of the table are offset by the 10,000 rows that are inserted for the same date, so invoice date is less likely to create a hot spot than invoice number. Also, a hardware RAID controller helps spread out the 10,000 rows across multiple disks, which can also minimize the possibility of an insertion hot spot.
There is no perfect answer to this scenario. You can place the clustered index on invoice date to speed up queries involving invoice date ranges even at the risk of hot spots. In this case, you should monitor disk queuing on the disks associated with this table for possible hot spots. It is recommended that you define the clustered index on invoice date because of the benefit to range scans based on invoice date and so that invoice numbers are not physically sequential on disk.
In this example, a table consists of invoice number, invoice date, invoice amount, sales office where the sale originated, and other data. Suppose 10,000 records are inserted into this table every day and users often query invoice amounts based on sales office. Sales office should be the column on which the clustered index is created because that is the range on which scans are based. Newly inserted rows will have a mix of sales offices; inserts should be spread evenly across the table and across the disks on which the table is located.
In some cases, range scans may not be an issue. For example, a very large employee table has employee number, social security number, and other data. As rows are inserted, employee number is incremented. There are 100,000 retrievals from this table every day and each retrieval is a single record fetch based on social security number. A nonclustered index created on social security number provides excellent query performance in this scenario. A clustered index on social security number provides slightly better query performance than the nonclustered index but may be excessive because range scans are not involved. If there will be only one index on this table, place the clustered index on the social security number column. The question then is whether to define a clustered index on this table. In earlier versions of SQL Server, it was important to define a clustered index on a table even if it was not required for queries because it helped with deleted row space recovery. This is not an issue with the SQL Server space allocation algorithms and storage structures.
The recommendation in this example is to create the clustered index on social security number. The reason is that the social security number has data distributed so it does not follow the sequential pattern of employee number and social security number tends to have an even distribution. If a clustered index is created on this evenly distributed column data, then the employee records will be evenly distributed on disk. This distribution, in conjunction with FILLFACTOR and PAD_INDEX, which will be discussed later, provides open data page areas throughout the table to insert data. Assuming that newly inserted employee records have an even distribution of social security numbers, the employee table fills evenly and page splitting is avoided. If a column with even distribution does not exist on the table, it is worthwhile to create an integer column on the table and populate the column with values that are evenly distributed and then create the clustered index column. This “filler” or “dummy” column with a clustered index defined on it is not being used to query, but to distribute data I/O across disk drives evenly to improve table access concurrency and overall I/O performance. This can be an effective methodology with large and heavily accessed SQL tables. Another possible solution in this example is to not to create a clustered index on this table. In this case, SQL Server manages all aspects of the space management. SQL Server finds a free space to insert the row, reuses space from deleted rows, and automatically reorganizes physical ordering of data pages on disk when it makes sense (to allow greater amounts of sequential I/O). The reorganization of data pages happens during database file autoshrink operations. For more information, see these topics in SQL Server Books Online: Managing Space Used by Objects and Space Allocation and Reuse.
FILLFACTOR and PAD_INDEX
If a SQL Server database is experiencing a large amount of insert activity, you should plan to provide and maintain open space on index and data pages to prevent page splitting. Page splitting occurs when an index page or data page can no longer hold any new rows and a row must be inserted into the page because of the logical ordering of data defined in that page. When this occurs, SQL Server must divide the data on the full page and move about half of the data to a new page so that both pages have some open space. This consumes system resources and time.
When indexes are built initially, SQL Server places the index B-tree structures on contiguous physical pages, which supports optimal I/O performance by scanning the index pages with sequential I/O. When page splitting occurs and new pages must be inserted into the logical B-tree structure of the index, SQL Server must allocate new 8-KB index pages somewhere. This occurs elsewhere on the hard disk drive and breaks up the physically sequential index pages. This switches I/O operations from sequential to nonsequential and cuts performance in half. Excessive page splitting should be resolved by rebuilding the index to restore the physically sequential order of the index pages. This same behavior can be encountered on the leaf level of the clustered index, which affects the data pages of the table.
Use Performance Monitor to watch the SQL Server: Access Methods - Page Splits counter. Nonzero values for this counter indicate page splitting. Further analysis should be done with the DBCC SHOWCONTIG statement. For more information about how to use this statement, see this topic in SQL Server Books Online: DBCC SHOWCONTIG.
The DBCC SHOWCONTIG statement can reveal whether excessive page splitting has occurred on a table. Scan Density is the key indicator that DBCC SHOWCONTIG provides. This value should be as close to 100 percent as possible. If this value is below 100 percent, rebuild the clustered index on that table by using the DROP_EXISTING option to defragment the table. The DROP_EXISTING option of the CREATE INDEX statement permits re-creation of existing indexes and provides better index rebuild performance than dropping and recreating the index. For more information, see these topics in SQL Server Books Online: CREATE INDEX and Rebuilding an Index.
The FILLFACTOR option on the CREATE INDEX and DBCC DBREINDEX statements provides a way to specify the percentage of open space to leave on index and data pages. The PAD_INDEX option for CREATE INDEX applies what has been specified for FILLFACTOR on the nonleaf-level index pages. Without the PAD_INDEX option FILLFACTOR mainly affects the leaf-level index pages of the clustered index. You should use the PAD_INDEX option with FILLFACTOR. For more information, see these keywords in SQL Server Books Online: page split and PAD_INDEX.
The optimal value to specify for FILLFACTOR depends on how much new data is inserted into an 8-KB index and data page within a given timeframe. Keep in mind that SQL Server index pages typically contain many more rows than data pages because index pages contain only the data for columns associated with that index whereas data pages hold the data for the entire row. Also consider how often there will be a maintenance window that permits the rebuilding of indexes to avoid page splitting. Strive to rebuild the indexes only as the majority of the index and data pages have become filled with data by properly selecting clustered index for a table. If the clustered index distributes data evenly so that new row inserts into the table occur across all of the data pages associated with the table, then the data pages will fill evenly. This provides time before page splitting occurs and you must rebuild the clustered index. FILLFACTOR should be selected based partly on the estimated number of rows that will be inserted within the key range of an 8-KB page in a certain time frame and partly by how often scheduled index rebuilds can occur on the system.
You must make a decision based on the performance trade-offs between leaving a lot of open space on pages and page splitting. A small specified percentage for FILLFACTOR leaves large open spaces on the index and data pages, which helps avoid page splitting but also negates some performance gained by compressing data onto a page. SQL Server performs faster if more data is compressed on index and data pages because it can fetch more data with fewer pages and I/Os if the data is compressed on the pages. Specifying too high a FILLFACTOR may leave too little open space on pages and can allow pages to overflow too quickly, which causes page splitting.
Before using FILLFACTOR and PAD_INDEX, remember that reads tend to outnumber writes, even in an online transaction processing (OLTP) system. Using FILLFACTOR slows down all reads, because it spreads tables over a wider area (reduction of data compression). Before using FILLFACTOR and PAD_INDEX, you should use Performance Monitor to compare SQL Server reads to SQL Server writes and use these options only if writes are a substantial fraction of reads (more than 30 percent).
If writes are a substantial percentage of reads, the best approach in a busy OLTP system is to specify as high a FILLFACTOR as will leave a minimal amount of free space per 8-KB page but still prevent page splitting and allow the SQL Server to reach the next available time window for rebuilding the index. This method balances tuning I/O performance (keeping the pages as full as possible) and avoiding page splits (not letting pages overflow). You can experiment by rebuilding the index with varying FILLFACTOR values and then simulating load activity on the table to validate an optimal value for FILLFACTOR. After the optimal FILLFACTOR value has been determined, automate the scheduled rebuilding of the index as a SQL Server task. For more information, see this keyword in SQL Server Books Online: creating a task.
SQL Server Performance Tuning Tools
Microsoft SQL Server version includes several tools that can assist database administrators with performance tuning.
Sample Data and Workload
This example shows how to use SQL Server performance tools. First, the table is constructed:
CREATE TABLE testtable (nkey1 int IDENTITY, col2 char(300) DEFAULT 'abc', ckey1 char(1))
Next, the table is loaded with 10,000 rows of test data:
DECLARE @counter int
SET @counter = 1
WHILE (@counter <= 2000)
BEGIN
INSERT testtable (ckey1) VALUES ('a')
INSERT testtable (ckey1) VALUES ('b')
INSERT testtable (ckey1) VALUES ('c')
INSERT testtable (ckey1) VALUES ('d')
INSERT testtable (ckey1) VALUES ('e')
SET @counter = @counter + 1
END
These queries comprise the database server workload:
SELECT ckey1,col2 FROM testtable WHERE ckey1 = 'a'
select nkey1,col2 FROM testtable WHERE nkey1 = 5000
SQL Server Profiler
SQL Server Profiler records detailed information about activity occurring on the database server. SQL Server Profiler can be configured to watch and record one or many users executing queries on SQL Server and to provide a widely configurable amount of performance information, including I/O statistics, CPU statistics, locking requests, Transact-SQL and RPC statistics, index and table scans, warnings and errors raised, database object create/drop, connection connect/disconnects, stored procedure operations, cursor operation, and more. For more information about what SQL Server Profiler can record, see this keyword in SQL Server Books Online: SQL Server Profiler.
Using SQL Server Profiler with Index Tuning Wizard
SQL Server Profiler and Index Tuning Wizard can be used together to help database administrators create proper indexes on tables. SQL Server Profiler records resource consumption for queries into a .trc file. The .trc file can be read by Index Tuning Wizard, which evaluates the .trc information and the database tables, and then provides recommendations for indexes that should be created. Index Tuning Wizard can either automatically create the proper indexes for the database by scheduling the automatic index creation or generate a Transact-SQL script that can be reviewed and executed later.
These are the steps for analyzing a query load:
 To set up SQL Server Profiler (Enterprise Manager)
1. On the Tools menu, click SQL Server Profiler.
2. On the File menu, point to New, and then click Trace.
3. Type a name for the trace.
4. Select Capture to file, then select a .trc file to which to output the SQL Server Profiler information.

 To run the workload (Enterprise Manager)
1. On the Tools menu, click SQL Server Query Analyzer.
2. Connect to SQL Server and set the current database to be where the table was created.
3. Enter these queries into the query window of SQL Server Query Analyzer:
SELECT ckey1,col2 FROM testtable WHERE ckey1 = 'a'
SELECT nkey1,col2 FROM testtable WHERE nkey1 = 5000
1. On the Query menu, click Execute.

 To stop SQL Server Profiler
1. On the File menu, click Stop Traces.
2. In the Stop Selected Traces dialog box, choose the traces to stop.

 To load the .trc file into the Index Tuning Wizard (SQL Server Profiler)
1. On the Tools menu, click Index Tuning Wizard and then click Next.
2. Select the database to analyze, then click Next.
3. Make sure I have a saved workload file is selected, then click Next.
4. Select My workload file, locate the .trc file created with SQL Server Profiler, click OK, and then click Next.
5. In Select Tables to Tune, select the tables and then click Next.
6. In Index Recommendations, select the indexes to create, and then click Next.
7. Select the preferred option, then click Next.
8. Click Finish.
This is the Transact-SQL generated by Index Tuning Wizard for the sample database and workload:
/* Created by: Index Tuning Wizard */
/* Date: 9/7/98 */
/* Time: 6:42:00 PM */
/* Server: HENRYLNT2 */
/* Database : test */
/* Workload file : E:\Mssql\Binn\Profiler_load.sql */

USE [test]
BEGIN TRANSACTION
CREATE CLUSTERED INDEX [testtable2] ON [dbo].[testtable] ([ckey1])
if (@@error <> 0) rollback transaction
CREATE NONCLUSTERED INDEX [testtable1] ON [dbo].[testtable] ([nkey1])
if (@@error <> 0) rollback transaction
COMMIT TRANSACTION
The indexes recommended by Index Tuning Wizard for the sample table and data are expected. There are only five unique values for ckey1 and 2000 rows of each value. Because one of the sample queries (SELECT ckey1, col2 FROM testtable WHERE ckey1 = a) requires retrieval from the table based on one of the values in ckey1, it is appropriate to create a clustered index on the ckey1 column. The second query (SELECT nkey1, col2 FROM testtable WHERE nkey1 = 5000) fetches one row based on the value of the column nkey1. nkey1 is unique, and there are 10,000 rows; therefore, it is appropriate to create a nonclustered index on this column.
SQL Server Profiler and Index Tuning Wizard are powerful tools in database server environments in which there are many tables and queries involved. Use SQL Server Profiler to record a .trc file while the database server is experiencing a representative set of queries. Then load the .trc file into Index Tuning Wizard to determine the proper indexes to build. Follow the prompts in Index Tuning Wizard to automatically generate and schedule index creation jobs to run at off-peak times. Run SQL Server Profiler and Index Tuning Wizard regularly (perhaps weekly) to see if queries executed on the database server have changed significantly, thus possibly requiring different indexes. Regular use of SQL Server Profiler and Index Tuning Wizard can keep SQL Server running in top form as query workloads change and database size increase over time.
For more information, see these topics in SQL Server Books Online: Index Tuning Wizard and Index Tuning Recommendations.
Analyzing SQL Server Profiler Information
SQL Server Profiler provides an option to log information into a SQL Server table. When it is complete, the table can be queried to determine if specific queries are using excessive resources.
 To log SQL Server Profiler information into a SQL Server table (Enterprise Manager)
1. On the Tools menu, click SQL Server Profiler.
2. On the File menu, point to New, and then click Trace.
3. Type a name for the trace, then select Capture to Table.
4. In the Capture to Table dialog box, enter a SQL Server table name to which to output the SQL Server Profiler information. Click OK.
5. On the File menu, click Stop Traces.
6. In the Stop Traces dialog box, choose the traces to stop.
For more information, see these topics in SQL Server Books Online: Viewing and Analyzing Traces, Troubleshooting SQL Server Profiler, Tips for Using SQL Server, Common SQL Server Profiler Scenarios, Starting SQL Server Profiler, and Monitoring with SQL Server Profiler.
SQL Server Query Analyzer
After the information is recorded into the SQL Server table, you can use SQL Server Query Analyzer to determine which queries on the system are consuming the most resources, and database administrators can concentrate on improving the queries that need the most help. For example, this query is typical of the analysis done on data recorded from SQL Server Profiler into a SQL Server table:
select top 3 TextData,CPU,Reads,Writes,Duration from profiler_out_table order by cpu desc
The query retrieves the top three consumers of CPU resources on the database server. Read and write I/O information, along with the duration of the queries in milliseconds is also returned. If a large amount of information is recorded with the SQL Server Profiler, you should create indexes on the table to help speed analysis queries. For example, if CPU is going to be an important criteria for analyzing this table, you should create a nonclustered index on CPU column.
Statistics I/O
SQL Server Query Analyzer provides a Show stats I/O option under the General tab of the Connections Options dialog box. Select this checkbox for information about how much I/O is being consumed for the query just executed in SQL Server Query Analyzer.
For example, the query SELECT ckey1, col2 FROM testtable WHERE ckey1 = a returns this I/O information in addition to the result set when the Show stats I/O connection option is selected:
Table 'testtable'. Scan count 1, logical reads 400, physical reads 382, read-ahead reads 400.
Similarly, the query SELECT nkey1, col2 FROM testtable WHERE nkey1 = 5000 returns this I/O information in addition to the result set when the Show stats I/O connection option is selected:
Table 'testtable'. Scan count 1, logical reads 400, physical reads 282, read-ahead reads 400.
Using STATISTICS I/O is a good way to monitor the effect of query tuning. For example, create the two indexes on this sample table as recommended by Index Tuning Wizard and then run the queries again.
In the query SELECT ckey1, col2 FROM testtable WHERE ckey1 = a, the clustered index improved performance as indicated below. The query must fetch 20 percent of the table; therefore, the performance improvement is reasonable.
Table 'testtable'. Scan count 1, logical reads 91, physical reads 5, read-ahead reads 32.
In the query SELECT nkey1, col2 FROM testtable WHERE nkey1 = 5000, the creation of the nonclustered index had a dramatic effect on the performance of the query. Because only one row of the 10,000 row table must be retrieved for this query, the performance improvement with the nonclustered index is reasonable.
Table 'testtable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0.
ShowPlan
ShowPlan can be used to display detailed information about what the query optimizer is doing. SQL Server provides text and graphical versions of ShowPlan. Graphical ShowPlan output can be displayed in the Results pane of SQL Server Query Analyzer by executing a Transact-SQL query with Ctrl+L. Icons indicate the operations that the query optimizer will perform if it executes the query. Arrows indicate the direction of data flow for the query. Details about each operation can be displayed by holding the mouse pointer over the operation icon. The equivalent information can be displayed in text-based ShowPlan by executing SET SHOWPLAN_ALL ON. To reduce the query optimizer operation details from text-based ShowPlan, execute SET SHOWPLAN_TEXT ON.
For more information, see these topics and/or keywords in SQL Server Books Online: Understanding Nested Loops Joins, worktables, and showplan.
Examples of ShowPlan Output
Use the previous example queries and the set showplan_text on option in SQL Server Query Analyzer:
Query:
SELECT ckey1,col2 FROM testtable WHERE ckey1 = 'a'
Text-based ShowPlan output:
|--Clustered Index Seek(OBJECT:([test].[dbo].[testtable].[testtable2]), SEEK:([testtable].[ckey1]='a') ORDERED)
This query takes advantage of the clustered index on column ckey1, as indicated by Clustered Index Seek.
Equivalent graphical showplan output:

If the clustered index is removed from the table, the query must use a table scan. The ShowPlan output below indicates the change in behavior:
Text-based showplan output:
|--Table Scan(OBJECT:([test].[dbo].[testtable]), WHERE:([testtable].[ckey1]='a'))
Equivalent graphical showplan output:

Table scans are the most efficient way to retrieve information from small tables. But on larger tables, table scans indicated by ShowPlan are a warning that the table may need better indexes or that the existing indexes must have their statistics updated (by using the UPDATE STATISTICS statement). SQL Server provides automatically updating indexes. You should let SQL Server automatically maintain index statistics because the maintenance helps guarantee queries will always work with good index statistics.
Query:
SELECT nkey1,col2 FROM testtable WHERE nkey1 = 5000
Text-based showplan output:
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([test].[dbo].[testtable]))
|--Index Seek(OBJECT:([test].[dbo].[testtable].[testtable1]), SEEK:([testtable].[nkey1]=5000) ORDERED)
Equivalent graphical showplan output:

This query uses the nonclustered index on the column nkey1, which is indicated by the Index Seek operation on the column nkey1. The Bookmark Lookup operation indicates that SQL Server must perform a bookmark lookup from the index page to the data page of the table to retrieve the requested data. The bookmark lookup was required because the query asked for the column col2, which was not part of the nonclustered index.
Query:
SELECT nkey1 FROM testtable WHERE nkey1 = 5000
Text-based showplan output:
|--Index Seek(OBJECT:([test].[dbo].[testtable].[testtable1]), SEEK:([testtable].[nkey1]=[@1]) ORDERED)
Equivalent graphical showplan output:

This query uses the nonclustered index on nkey1 as a covering index. No bookmark lookup operation was needed for this query because all of the information required for the query (both SELECT and WHERE clauses) is provided by the nonclustered index. No bookmark lookup to the data pages is not required from the nonclustered index pages. I/O is reduced in comparison to the case in which a bookmark lookup was required.
Performance Monitor
Performance Monitor provides information about Windows and SQL Server operations occurring on the database server. For SQL Server specific counters, see your SQL Server documentation.
In Performance Monitor graph mode, note the Max and Min values. Do not emphasize the average because polarized data points can affect this value. Study the graph shape and compare to Min and Max to gather an accurate understanding of the behavior. Use BACKSPACE to highlight counters.
You can use Performance Monitor to log all available Windows NT and SQL Server performance monitor objects and counters in a log file and also view Performance Monitor interactively in chart mode. Setting a sampling interval determines how quickly the log file grows. Log files can get big fast (for example, 100 MG in one hour with all counters turned on and a sampling interval of 15 seconds). The test server should have a couple of gigabytes free to store these files. But if conserving space is important, try running with a large log interval of 30 or 60 seconds so Performance Monitor does not sample the system as often, and all of the counters are resampled with reasonable frequency but a smaller log file size is maintained.
Performance Monitor also consumes a small amount of CPU and Disk I/O resources. If a system does not have a lot of disk I/O and/or CPU to spare, consider running Performance Monitor from another computer to monitor the SQL Server over the network. This applies to Performance Monitor graph mode only. When using log mode, it is more efficient to log Performance Monitor information locally on the SQL Server. If you must use log mode over the network, then consider reducing the logging to only the most critical counters.
You should log all counters available during performance test runs into a file for analysis later. Configure Performance Monitor to log all counters into a log file and at the same time monitor the most interesting counters in one of the other modes, such as graph mode. Then all of the information is recorded but the most interesting counters are presented in an uncluttered Performance Monitor graph while the performance run is taking place.
 To start the logging feature (Performance Monitor)
1. On the View menu, click Log.
2. Click (+) button.
3. In the Add to Log dialog box, select the counters to add to the log.
4. Click Add, then click Done.
5. On the Options menu, click Log.
6. In File Name, enter the name of the file into which the performance information will be logged.
7. Click Start Log.

 To stop the logging feature (Performance Monitor)
1. On the Options menu, click Log.
2. Click Stop Log.

 To load the logged information into Performance Monitor (Performance Monitor)
1. On the View menu, click Log.
2. On the Options menu, click Data From, and then select Log File.
3. Click the browse (…) button, then double-click the file.
4. In the Data From dialog box, click OK.
5. On the View menu, click Chart, and then click the (+) button.
6. Click on the button with the + sign on it.
7. In the Add to Chart dialog box, add desired counters to the graphical display by highlighting the object/counter combination, and then click Add.

 To relate Performance Monitor logged events back to a point in time
1. Follow the steps for How to load information into Performance Monitor.
2. On the Edit menu, click Time Window.
3. In the Input Log File Timeframe dialog box, you can adjust the start and stop time window of the logged data by clicking and holding down the mouse button on the slidebars.
4. Click OK to reset the chart to display only data logged for the selected time window.

Key Performance Monitor Counters
You can observe several Performance Monitor disk counters. To enable these counters, run the diskperf –y command from a Windows NT command window and restart Windows NT. The diskperf -y command does consume some resources on the database server but it is worthwhile to run the diskperf -y command on all production SQL Server servers so disk queuing problems can be confirmed immediately and all Performance Monitor counters are immediately available to diagnose disk I/O issues. If disk I/O counters are required, the diskperf -y command must be executed and Windows NT must be restarted before the disk I/O counters will report data in Performance Monitor.
(Physical or Logical) Disk Queue > 2

Physical hard disk drives that experience disk queuing hold back disk I/O requests while they catch up on I/O processing. SQL Server response time is degraded for these drives, which costs query execution time.

If you use RAID, you must know how many physical hard disk drives are associated with each drive array that Windows NT interprets as a single physical drive so you can calculate disk queuing per physical drive. Ask a hardware expert to explain the SCSI channel and physical drive distribution in order to understand how SQL Server data is held by each physical drive and how much SQL Server data is distributed on each SCSI channel.
There are several choices for looking at disk queuing in Performance Monitor. Logical disk counters are associated with the logical drive letters assigned by Disk Administrator, whereas physical disk counters are associated with what Disk Administrator sees as a single physical disk device. What looks like a single physical device to Disk Administrator may be either a single hard disk drive, or a RAID array, which consists of several hard disk drives. The Current Disk Queue counter is an instantaneous measure of disk queuing, whereas the

Average Disk Queue counter averages the disk queuing measurement over the Performance Monitor sampling period. Take note of any counter in which
Logical Disk: Average Disk Queue is greater than 2,
Physical Disk: Average Disk Queue is greater than 2,
Logical Disk: Current Disk Queue is greater than 2, or
Physical Disk: Average Disk Queue is greater than 2.

These recommended measurements are specified per physical hard disk drive. If a RAID array is associated with a disk queue measurement, the measurement must be divided by the number of physical hard disk drives in the RAID array to determine the disk queuing per physical hard disk drive.
On physical hard disk drives or RAID arrays that hold SQL Server log files, disk queuing is not a useful measure because SQL Server log manager does not queue more than a single I/O request to SQL Server log file(s).
For more information, see this topic in SQL Server Books Online: Monitoring Disk Activity.
System: Processor Queue Length > 2 (per CPU)

When this counter is greater than 2, the server processors are receiving more work requests than they can handle as a group, therefore Windows must place these requests in a queue.
Some processor queuing can be an indicator of good SQL Server I/O performance. If there is no processor queuing and if CPU use is low, it can be an indication of a performance bottleneck somewhere in the system, and most likely in the disk subsystem. A reasonable amount of work in the processor queue means that the CPUs are not idle and the rest of the system is keeping pace with the CPUs.
A general rule for determining an optimal processor queue number is to multiply the number of CPUs on the database server by 2.
Processor queuing significantly above 2 per CPU should be investigated. Excessive processor queuing costs query execution time. Eliminating hard and soft paging can help save CPU resources. Other methods that help reduce processor queuing include tuning SQL queries, picking better SQL indexes to reduce disk I/O (and hence CPU), or adding more CPUs (processors) to the system.
Hard Paging - Memory: Pages/Sec > 0 or Memory: Page Reads/Sec > 5

Memory: Pages/Sec greater than 0 or Memory: Page Reads/Sec greater than 5 means that Windows is going to disk to resolve memory references (hard page fault), which costs disk I/O and CPU resources.
The Memory: Pages/Sec counter is a good indicator of the amount of paging that Windows is performing and the adequacy of the database server RAM configuration. A subset of the hard paging information in Performance Monitor is the number of times per second Windows had to read from the paging file to resolve memory references, which is represented by the Memory: Pages Reads/Sec counter.
A value of Memory: Pages Reads/Sec greater than 5 is bad for performance.
Automatic SQL Server memory tuning adjusts SQL Server memory use dynamically so that paging is avoided. A small number of pages per second is normal but excessive paging requires corrective action.

If SQL Server is automatically tuning memory, then adding more RAM or removing other applications from the database server are options to help bring the Memory: Pages/Sec counter to a reasonable level.

If SQL Server memory is being manually configured on the database server, it may be necessary to reduce memory given to SQL Server, remove other applications from the database server, or add more RAM to the database server.

Keeping Memory: Pages/Sec at or close to zero helps database server performance because Windows and all its applications (including SQL Server) are not going to the paging file to satisfy any data in memory requests so the amount of RAM on the server is sufficient. A Pages/Sec value slightly greater than 0 is not horrible but a relatively high performance penalty (Disk I/O) is paid every time data is retrieved from the paging file rather than from RAM.

You should understand the difference between the Memory: Pages Input/sec counter and the Memory: Pages Reads/sec counter. The Memory: Pages Input/sec counter indicates the actual number of Windows 4-KB pages being brought from disk to satisfy page faults. The Memory: Pages Reads/sec counter indicates how many disk I/O requests are made per second to satisfy page faults, which provides a slightly different point of view. So a single page read could contain several Windows 4-KB pages. Disk I/O performs better as the packet size of data increases (64 KB or more), so it can be worthwhile to consider these counters at the same time. You should also remember that for a hard disk drive, completing a single read or write of 4 KB costs almost as much time as a single read or write of 64 KB. Consider this situation: 200 page reads consisting of eight 4-KB pages per read could finish faster than 300 page reads consisting of a single 4-KB page. And we are comparing 1600 4-KB page reads finishing faster than 300 4-KB page reads. The key to all disk I/O analysis is to watch not only the number of disk bytes/sec, but also the disk transfers/sec. For more information, see
“Disk I/O Counters” and “The EMC Disk I/O Tuning Scenario” later in this document.

It is useful to compare the Memory: Page Input/sec counter to the Logical Disk: Disk Reads/sec counter across all drives associated with the Windows NT paging file, and the Memory: Page Output/sec counter to the Logical Disk: Disk Writes/sec counter across all drives associated with the Windows paging file. They provide a measure of how much disk I/O is strictly related to paging as opposed to other applications, such as SQL Server. Another way to isolate paging file I/O activity is to ensure the paging file is located on a separate set of drives from all other SQL Server files. Separating the paging file from the SQL Server files also can help disk I/O performance because disk I/O associated with paging can be performed in parallel to disk I/O associated with SQL Server.
Soft Paging - Memory: Pages Faults/Sec > 0

If the Memory: Pages Faults/Sec counter is greater than 2, it indicates that Windows NT is paging but includes hard and soft paging within the counter. Soft paging means that there are application(s) on the database server that request memory pages still inside RAM but outside of the application’s Working Set. The Memory: Page Faults/Sec counter is helpful for deriving the amount of soft paging that occurs. There is no counter called soft faults per second. Instead, calculate the number of soft faults happening per second this way:

Memory: Pages Faults/sec - Memory: Pages Input/sec = Soft Page Faults per Second

To determine if SQL Server is causing excessive paging, monitor the Process: Page Faults/sec counter for the SQL Server process and note whether the number of page faults per second for Sqlservr.exe is similar to the number of pages per second.
Because soft faults consume CPU resources, soft faults are better than hard faults for performance. Hard faults consume disk I/O resources. The best environment for performance is to have no faulting of any kind.
Until SQL Server accesses all of its data cache pages for the first time, the initial access to each page causes a soft fault. Do not be concerned if soft faulting occurs under these circumstances. For more information, see “Monitoring Processors” in this document.
For more information on memory tuning, see this topic in SQL Server Books Online: Monitoring Memory Usage.
Monitoring Processors

Keep all server processors busy enough to maximize performance but not so busy that processor bottlenecks occur. The performance tuning challenge is to determine the source of the bottleneck. If CPU is not the bottleneck, then a primary candidate is the disk subsystem, so the CPU is being wasted. CPU is the most difficult resource to expand above a specific configuration level, such as four or eight on many current systems, so it is a good sign when CPU utilization is above 95 percent. In addition, the response time of transactions should be monitored to ensure they are within reason; if they are not, then greater than 95 percent CPU use can mean that the workload is too much for the available CPU resources and either the CPU must be increased or the workload must be reduced or tuned.
Monitor the Processor: Processor Time % counter to ensure all processors are consistently below 95 percent utilization on each CPU. The System:Processor Queue counter is the processor queue for all CPUs on a Windows NT-based system. If the System: Processor Queue counter value is greater than 2 per CPU, there is a CPU bottleneck and it is necessary to either add processors to the server or reduce the workload on the system. Reducing workload can be accomplished by query tuning or improving indexes to reduce I/O and subsequently CPU usage.
Another counter to monitor when a CPU bottleneck is suspected is System: Context Switches/sec because it indicates the number of times per second that Windows NT and SQL Server had to change from executing on one thread to executing on another. Context switching is a normal component of a multithreaded, multiprocessor environment, but excessive context switching slows down a system. Only be concerned about context switching if there is processor queuing. If processor queuing is observed, use the level of context switching as a gauge when you performance tune SQL Server. Consider using the lighweight pooling option so that SQL Server switches to a fiber-based scheduling model versus the default thread-based scheduling model. Think of fibers as lightweight threads. Use command sp_configure 'lightweight pooling', 1 to enable fiber-based scheduling. Watch processor queuing and context switching to monitor the effect.
The DBCC SQLPERF (THREADS) statement provides more information about I/O, memory, and CPU use mapped to SPID.
Disk I/O Counters

The Disk Write Bytes/Sec and Disk Read Bytes/Sec counters provides the data throughput in bytes per second per logical drive. Weigh these numbers carefully along with the values of the Disk Reads/Sec and Disk Writes/Sec counters. Do not let a low number of bytes per second lead you to believe that the disk I/O subsystem is not busy. A single hard disk drive is capable of supporting a total of 75 nonsequential and 150 sequential disk reads and disk writes per second.
Monitor the Disk Queue Length counter for all drives associated with SQL Server files and determine which files are associated with excessive disk queuing.
If Performance Monitor indicates that some drives are not as busy as others, you can move SQL Server files from drives that are bottlenecking to drives that are not as busy. This spreads disk I/O activity more evenly across hard disk drives. If one large drive pool is used for SQL Server files, than the resolution to disk queuing is to make the I/O capacity of the pool bigger by adding more physical drives to the pool.
Disk queuing may be a symptom that one SCSI channel is saturated with I/O requests. Performance Monitor cannot directly detect if this is true. Hardware vendors can provide tools to detect the amount of I/O serviced by a RAID controller and whether the controller is queuing I/O requests. This is more likely to occur if many disk drives (10 or more) are attached to the SCSI channel and they all perform I/O at full speed. To resolve this issue, take half of the disk drives and connect them to another SCSI channel or RAID controller to balance the I/O. Rebalancing drives across SCSI channels requires a rebuild of the RAID arrays and full backup and restore of the SQL Server database files.
Performance Monitor Graph Output

The chart shows typical counters that Performance Monitor uses to observe performance. The Processor Queue Length counter is being observed. Click BackSpace to highlight the current counter. This helps to distinguish the current counter from other counters being observed and can be particularly helpful when observing many counters at the same time with Performance Monitor.
The Max value for Processor Queue Length is 22.000. Max, Min, and Average values for the Performance Monitor Graph cover only the current time window for the graph as indicated by Graph Time. By default, Graph Time covers 100 seconds. To monitor longer periods of time and to be sure to get representative Max, Min, and Average values for those time periods, use the logging feature of Performance Monitor.
The shape of the Processor Queue Length graph line indicates that the Max of 22 occurred only for a short period of time. But there is a period preceding the 22 value when Processor Queue Length is greater than 5, 100 percent is 22 and there is a period prior to the 22 value when the graph has values above 25 percent, which is approximately five. In this example, the database server SQLCON has only one processor and should not sustain Processor Queue Length greater than two. Therefore, Performance Monitor indicates that the processor on this computer is being overtaxed and that further investigation should be made to reduce the load on the processor or more processors should be added to SQLCON to handle these periods of higher processor workloads.
Other Performance Topics
This section presents other factors that can influence the performance of Microsoft SQL Server.
Reduce Network Traffic and Resource Consumption

Database programmers who work with SQL Server relatively easy to use interfaces such as Microsoft ActiveX Data Objects (ADO), Remote Data Objects (RDO), and Data Access Objects (DAO) database APIs must remain aware of the result sets they are building. ADO, RDO, and DAO provide programmers with database development interfaces that allow rich database rowset functionality without requiring a lot of database programming experience. Programmers can encounter performance problems if they neither account for the data their application is returning to the client nor stay aware of where the SQL Server indexes are placed and how the SQL Server data is arranged. SQL Server Profiler, Index Tuning Wizard, and ShowPlan are helpful tools for pinpointing and fixing problem queries.
Look for opportunities to reduce the size of the result set by eliminating columns in the select list that do not have to be returned, or by returning only the required rows. This reduces I/O and CPU consumption.
For more information, see these topics in SQL Server Books Online: Optimizing Application Performance Using Efficient Data Retrieval, Understanding and Avoiding Blocking, and Application Design.
Deadlocking

If applications accessing SQL Server are built so transactions access tables in the same chronological order across all user transactions, you may avoid deadlocking. You should explain the concept of chronological table access to SQL Server application developers as early as possible during the application design process to avoid deadlocking problems, which are expensive to solve later.
By reducing SQL Server query I/O and shortening transaction time you can help prevent deadlocking. This can make queries faster, lock resources are held for a shorter period of time, and reduce all locking contention (including deadlocking). Use the SQL Server Query Analyzer Show stats I/O option to determine the number of logical page fetches associated with large queries. Use the SQL Server Query Analyzer Show query plan option to review index use and then consider a SQL Server query redesign that is more efficient and uses less I/O.
For more information, see these topics in SQL Server Books Online: Avoiding Deadlocks, Troubleshooting Deadlocking, Detecting and Ending Deadlocks, and Analogy to Nonserializable Transactions.
Language to Avoid in Queries
Use of inequality operators in SQL queries forces databases to use table scans to evaluate the inequalities. These queries generate high I/O if they run regularly against large tables.
For example, using any WHERE expression with NOT in it:
WHERE != some_value
WHERE <> some_value
If you must run these queries, restructure the queries to eliminate the NOT keyword or operator.
Instead of using:
SELECT * FROM tableA WHERE col1 != 'value'
Try using:
SELECT * FROM tableA WHERE col1 < 'value' or col1 > 'value'
SQL Server can use the index (preferably clustered), if it is built on col1, rather than resorting to a table scan.
Smart Normalization
On frequently accessed tables, move columns that a database application does not need regularly to another table. By eliminating as many columns as possible, you can reduce I/O and increase performance. For more information, see these topics in SQL Server Books Online: Logical Database Design and Normalization.
Partitioned Views
You can horizontally partition tables through views in SQL Server . This provides I/O performance benefits when database users query subsections of large views. For example, if a large table documents sales for all sales departments for a year and that retrievals from this table are based on a single sales department, a partitioned view could be employed. A sales table is defined for each sales department, a constraint is defined on the sales department column on each table, and then a view is created on all of the tables to form a partitioned view. The query optimizer uses the constraint on the sales department column. When the view is queried, all of the sales department tables that do not match the sales department value provided in the query are not ignored by the query optimizer and no I/O is performed against those base tables. This improves query performance by reducing I/O.
For more information, see these topics in SQL Server Books Online: Scenarios for Using Views, CREATE VIEW, Using Views with Partitioned Data, Modifying Data Through a View, Copying To or From a View, and Partitioning.
Replication and Backup Performance
If you ensure that the disk I/O subsystem and CPUs are performing well, you ensure performance benefits to all SQL Server operations, including replication and backups. Transactional replication and transaction log backups read from transaction log files. Snapshot replication and backups perform serial scans of database files. The SQL Server storage structures have made these operations fast and efficient, as long as there is no queuing occurring in the database server CPUs or disk subsystems.
For more information, see these topics and/or keywords in SQL Server Books Online: Optimizing Backup and Restore Performance, Creating and Restoring Differential Database Backups, Creating and Applying Transaction Log Backups, Using Multiple Media or Devices, Minimizing Backup and Recovery Times in Mission-Critical Environments, Backup/Restore Architecture, SQL Server on Large Servers, and replication performance.
The EMC Disk I/O Tuning Scenario
For those implementing SQL Server database systems on the unique EMC Symmetrix Enterprise Storage Systems, some disk I/O balancing methods can help avoid disk I/O bottleneck problems and maximize performance.
Symmetrix storage systems contain up to 16 GB of RAM cache and contain internal processors within the disk array that help speed the I/O processing of data without using host server CPU resources. You must understand the four components in the Symmetrix storage system to balance disk I/O. One component is the 16-GB cache inside the Symmetrix. Up to 32 SA channels can be used to cable up to 32 SCSI cards from Windows NT-based host servers into the Symmetrix; all of these SA channels can be requesting data simultaneously from the 16-GB cache. Within the Symmetrix storage system, there are up to 32 connectors called DA controllers (internal SCSI controllers) that connect all of the internal disk drives within the Symmetrix into the 4-GB internal cache. And finally, there are the hard disk drives within the Symmetrix.
EMC hard disk drives are SCSI hard disk drives with the same I/O capability of the other SCSI drives referred to in this document. One feature commonly used with EMC technology is referred to as hyper-volumes. A hyper-volume is the logical division of an EMC hard disk drives, so the hyper-volume looks like another physical drive to the Windows NT Disk Administrator and can be manipulated with Windows NT Disk Administrator like any other disk drive. Multiple hyper-volumes can be defined on each physical drive. You should work closely with EMC field engineers to identify how hyper-volumes are defined when conducting database performance tuning on EMC storage. You can overload a physical drive with database I/O if you think two or more hyper-volumes are separate physical drives but actually are two or more hyper-volumes on the same physical drive.
SQL Server I/O activities should be divided evenly among distinct DA controllers because DA controllers are assigned to a defined set of hard disk drives. DA controllers are not likely to suffer an I/O bottleneck, but the set of hard disk drives associated with a DA controller may be more susceptible. SQL Server disk I/O balancing is accomplished the same way with DA controllers and their associated disk drives as with other vendors disk drives and controllers.
To monitor the I/O on a DA channel or separate physical hard disk drives, get help from EMC technical support staff because I/O activity occurs beneath the EMC internal cache and is not visible to Performance Monitor. EMC storage units have internal monitoring tools that allow an EMC technical support engineer to monitor I/O statistics within the Symmetrix. Performance Monitor can only see I/O coming to and from an EMC storage unit by the I/O coming from a SA channel. This is enough information to indicate that a SA channel is queuing disk I/O requests, but is not enough information to determine the disk or disks that are causing the disk queuing. If SA channel is queuing, it may be the disk drives and not the SA channel that is causing the bottleneck. One way to isolate the disk I/O bottleneck between the SA channels and the DA channels and drives is to add a SCSI card to the host server and connect it to another SA channel. If Performance Monitor indicates that I/O across both SA channels has not changed in volume and disk queuing is still occurring, then the SA channels are not causing the bottleneck. Another way to isolate the I/O bottleneck is to have an EMC engineer use EMC monitoring tools to monitor the EMC system and analyze the drives or DA channels that are bottlenecking.
Divide SQL Server activities evenly across as many of disk drives as are available. If you work with a smaller database that sustain a large amount of I/O, consider the size of hyper-volume to have EMC technical engineers define. Suppose the SQL Server will consist of a 30-GB database. EMC hard disk drives can provide up to 23 GB in capacity, so you can fit the entire database onto 2 drives. For manageability and cost, this is appealing; but for I/O performance, it is not. An EMC storage unit can work with more than 100 internal drives. Involving only 2 drives for SQL Server can lead to I/O bottlenecks. Consider defining smaller hyper-volumes, perhaps of 2 GB each. Then approximately 12 hyper-volumes can be associated with a given 23-GB hard disk drive. Assuming 2-GB hyper-volumes, 15 hyper-volumes are required to store the database. Make sure that each hyper-volume is associated with a separate physical hard disk drive. Do not use 12 hyper-volumes from 1 physical drive and then 3 hyper-volumes associated on another physical drive, because this is the same as using 2 physical drives (150 nonsequential I/O / 300 sequential I/O across the 2 drives). But with 15 hyper-volumes, each of which are associated with a separate physical drive, SQL Server uses 15 physical drives for providing I/O (1125 nonsequential / 2250 sequential I/O activity per second across the 15 drives).
Also consider employing several SA channels from the host server to divide the I/O work across controllers for host servers that support more than a single PCI bus. Consider using one SA channel per host server PCI bus to divide I/O work across PCI buses as well as SA channels. On EMC storage systems, each SA channel is associated with a specific DA channel and a specific set of physical hard disk drives. Because SA channels read and write their data to and from the EMC internal cache, it is unlikely the SA channel is a point of I/O bottleneck. Because SCSI controller bottlenecks are not likely, it is probably best to concentrate on balancing SQL Server activities across physical drives rather than worry about how many SA channels to use.

To find more information on this subject mail to
sqlcon@yahoogroups.com

SQL BLOG WITH LOG

Thursday, March 25, 2004

Previous Posts

Archives

About Me