Thursday, May 27, 2004

Plan your Server before it over flows

Before planning for the capacity of the SQL Server we must include the following details

1. Resource Requirements.
2. Process Flow Chart.
3. System Classification.
4. CPU and Memory Management.
5. Disk Planning.
6. Data Storage Subsystem Management.
7. Database File Placement.
8. Monitoring For capacity management.


1. Resource Requirements: -

By Resource Requirements we must be able to recognize and use the various resource heads available around us. Here we are talking of knowledge and Skill Set domains, which can be of effective role while planning for capacity management. They are as follows: -

(A) Network domain administrator: - He should have the perspective of network bandwidth and when application is online how much congestion it can create. Also for the ports and other network issues related.

(B) Vendor Representative: - you must get the critical technical know how about the hardware supplied by your vendor. It is not the matter of financial perspective only but from the performance/scaling/reliability perspective as well. One must get the docs for best practices involved with the respective Hardware.

(C) Application Developer: - They are the people from whom any change in the application scenario will come .So interaction with them is critical. Any change in Data access or Database schema has to be logged and carefully analyzed for performance and other side effected.


3. System Classification

We must create a standardized classification system, intended for a common reference point in dealing with the transactional volume, read/write activity, and data mass (size).So, rather than describing a system as a “small reporting system,” you can more accurately identify your system in relation to others.



The generic codes used here are simply for the purpose of providing an example. To create a classification code that is specific to your system, use the following measures:

Transactions per second (TPS): - (For the purposes of this paper, TPS is measurable through the \SQLServer:Databases (Total)\Transactions/sec counter in System Monitor.

You can also observe transaction statistics by analyzing a trace containing Transaction Event Class data, or by creating a customized counter specific to your system.)

The read/write ratio
The total size of the related database files (mass)

The measurement of the database size (mass) can be applied at any level in the chart. If you have a terabyte of data, you would indicate this by adding the prefix “1T” to the classification code, like this: 1T 7000 R100. If you have 100 GB of data, the code might look like this: 100G 500 R20.

4.CPU and Memory Capacity Management: -

Some of the most critical components included in any system are CPU and Memory usage. Based on the historical performance data and track record of the growth requirement of resources, One can forecast and predict the usage of both.

CPU Planning
Processor planning is fairly straightforward. Monitor your current CPU utilization (\\Processor (_Total)\% Processor Time). If the average is over 50 percent, if you have frequent peak usage periods when the current CPU utilization spikes over 90percent, or if you have a situation in which the usage spikes and stays up for a
While, then you should consider adding either additional or faster processors.

In general, the processors you choose should be able to deliver the speed implied in your other system purchases. If your system is highly specialized and filled with processor-intensive activities, you will become aware of that as you observe the system over time. Examples of such activities include extensive or frequent usage of Data Transformation Services, or anything involving many calculations (science, accounting, and so on). SQL Server is a CPU-intensive application, so look for processors with a large high-speed cache. Always get the fastest and newest when it comes to processing power, because the processor enables the rest of the server to do its
job well.

If you have a dedicated SQL Server computer, use all the processors for SQL Server. If your system is running applications in addition to SQL Server (such as Microsoft Commerce Server), then consider restricting SQL Server from using one or more processors. Otherwise, allow SQL Server and Windows to balance across all processors,
as they were designed to do.


Memory Planning

While the sum of all hardware together dictates the capacity of a system, memory serves mainly to optimize data access. SQL Server uses memory to store execution plans, store data pages between uses, and so on. Without enough memory, you will incur more disks I/O in reading data. If your system does many reads, you might reduce disk I/O by significantly increasing your memory, because the data will then remain in cache. Insufficient memory, or over-allocation of memory, can result in Paging. Memory plays an important role in SQL Server, and it is a resource you should carefully monitor.

For systems such as decision support systems (DSS) for which reads are the highest priority, more memory is better. Memory can be used to compensate for disk I/O, and large amounts of memory can significantly decrease the number of disk spindles you will need to achieve high performance.

For systems such as online transaction processing (OLTP) systems for which writes are the highest priority, memory is still an important part of the system, but you may benefit more from the addition of disk spindles and more or faster controller channels, rather than memory. Carefully monitor your system to see which resources are in highest demand.

5. Disk Planning

The important point to remember about data storage is that the number of disks is far more important than the total storage size of the disks.

At any moment the raw space required for any system is defined as per the following equation

Minimum Disk Space required =

Size of Data (per Database, including the system databases)
+ Size of Indexes (per Database, including the system database)
+ Planned growth + MS DTC logging space
+ Amount of OS Reserved Space
+ Amount Reserved for hardware Optimization

To know the system we must know the way the Application is using its Databases not just user defined but system databases as well like TEMPDB and MSDB.

One big physical disk may hold all your data, but it still has only one disk arm to execute, individually, each data request.

The more disk arms you have, the better off you will be. So, when you size for new disks, do a quick check to be sure this is enough drive space; but spend more time on analyzing how many spindles you really need. For example, if your system performs a lot of transactions, you will enhance performance by adding more spindles (provided there is sufficient memory and CPU to support the system).

When you are ordering your hardware, request a specific number of disks, rather than a specific amount of disk space. Having more small disks is better than having fewer large disks. If you have external storage, go for the fastest array controller card, and one that has multiple channels. Look at this card as a potential bottleneck:

If you have multiple spindles, you need to invest in a card that can support them. The performance you achieve will be directly proportional to the quality of the controller, and the type of I/O your system produces.

In OLTP, you can have more disks per controller card, which means the disk spends more time looking for the data, and the controller channel will not become so saturated.

DISK CONTROLERS

Cache
Not all write caching is safe for use by a database server. Make sure that your disk controller has features such as safeguards against uncontrolled reset of the caching controller, on-board battery backup, and mirrored or error-checking-and-correcting memory.

Do not implement write caching unless the hardware vendor guarantees
that their write cache includes these features and any others required to prevent data loss.

Array accelerator cache settings can be left at the default value, which is typically 50:50 read:write. These settings can also be adjusted to favor reads or writes if you know which your system requires. Note that if you are using a write setting above
Zero here, you have enabled write caching.


If your array configuration controller supports using more than one channel, make sure you take advantage of it. Fast channels have a tremendous effect on I/O performance.

Windows NT File System (NTFS) Allocation Unit

SCSI Drives: When you format the new drives in Disk Administrator, you should consider an allocation unit, or block size, that will provide optimal performance.

Significant performance gains may be obtained by sizing this to a larger value in order to reduce disk I/O; however, the default value is based on the size of the physical disk. The best practice for SQL Server is to choose 64 KB, because this reduces the likelihood of I/O that spans distinct NTFS allocations, which then might result in split I/O.

Keep in mind that although this information can be useful, the type of storage you are using (and in some cases your backup software) will drive the format of your disks. If you are changing the block size on an existing system, be sure to run a baseline in your test environment and another after you have tested the changes.


6. Data Storage Subsystem Management

When Ever you are opting for the disk storage consult the Run Book as well Point 12,which discusses RAID. We shouldn’t be bothered only about the space requirements in GB-S and TB-s but also on the physical Drive setup. We must prefer Hard ware RAID as compared to the OS Raid. Reason being the Hardware RAID gives more redundancy if any hardware crashes as well as has performance benefit. OS Raid uses Processor Cycles and hampers SQL from performing to the best.

Two core RAID levels are of value for a database server: striping with parity (RAID 5) and striped mirror (RAID 0+1). The best overall option is to choose RAID 0+1 (also called RAID 01 or “striped mirror”). RAID 5 can be used in certain circumstances, but is generally more expensive and less reliable in the long run.

In RAID 5, each time data is written to disk, it actually takes four I/O operations to create the read data and parity blocks, and the write data and parity blocks. This is slow for two reasons: First, each process is consecutive, so they must wait for each other; second, this operation occurs while many other transactions are vying for the disk resources. RAID 0+1 writes to the primary disk and the mirror in one operation. Although you do have to wait for the write to complete on both drives, both writes are simultaneous.

Moreover RAID 5 has performance hit in writing data. Only advantage is that you get more space at less cost as compared with RAID 0 + 1.


Although data can be restored from database backups, it is important to note the effect that failed drives can have. If any two disks fail in RAID 5, the database will stop (unless you have a hot standby disk that has been synced within the chain, but in any case, you cannot lose more than one disk from the whole working set).

RAID 0+1 will stop the database only if a disk fails in both sides of a mirrored set at the same time, and the odds of that occurring based on random factors are about 5.3 percent. RAID 5 imposes a significant penalty for losing even one disk. When one drive is lost on a RAID 5 system, the read performance of the system immediately decreases.


Every read or write request to the failed drive initiates a verification process against all other drives in the parity group.

This performance degradation exists until the drive is replaced and completely rebuilt by the system. During the rebuild process, the system is more sensitive to system load due to the considerably
heavier I/O requirements of the failed system.

This can be a critical consideration. RAID 0+1 sees minimal loss of performance in a failed state where the hardware allows reads from both disks in a set. In this case, read performance is slightly reduced,
but only for data stored on that particular set. RAID 0+1 can actually read simultaneously from both drives in a mirrored set. This is not a simultaneous read for the same I/O operation, but for different ones.

So when you have multiple read requests for the same physical disk, the I/O operations are spread over the two disks in the mirrored set.


7. Database File Placement.

As discussed in Run book About the Database file groups placement.we are just reemphasizing the same here.

A. For SCSI and SAN systems, the most important considerations in determining where to place your files on the server are the number of disk spindles available to a particular drive and the speed of the drives involved. For this reason, it is good to design the server hardware requirements and/or layout with your database needs in mind. Be careful about buying hardware before you have a firm design plan.

B. If you have a set of tables that is used together frequently, consider putting these tables in separate filegroups on separate physical drives, to balance I/O between them. In a larger, more heavily used system; this could make a significant difference.

C. If disk I/O is a problem, and you cannot add more spindles to the set of disks, consider putting non-clustered indexes in a separate filegroup on a separate disk, in order to split I/O between filegroups.

D. Group your tables based on usage, in order to generate as many simultaneous reads to different filegroups (and therefore disks) as possible. Grouping tables into filegroups based on a maintenance need for convenient backup plans will not generate as much performance as separating the tables and indexes by usage.

E. If you have more than enough spindles for your data performance, consider breaking the data across filegroups on this set of disks, for the purpose of speeding up some of your administrative tasks, such as reindexing. The main reason to do this, however, is to speed up any necessary restores. Microsoft SQL Server 2000 Operations Guide 160

G. For systems smaller than Class 1000A, you could use Auto Grow for your database files. For systems that fit in this class or above, keep in mind that when a “grow” is initiated, transactions must wait while the database grows.
In a small database or lightly queried system this is not a big issue, but if you have a 100 GB OLTP database set to grow in 10 percent increments, and it runs out of space during peak times, the online users will be held up while the 10 GB is allocated.
(Allocation speed per GB can be measured by copying a 1 GB file to the data drive on that server.)
For these systems, the best practice is to anticipate database growth and manually increase the database at a scheduled time. Or, choose a reasonable amount to grow by that is neither too cumbersome nor so small that it will initiate expansion too frequently.

Ideally, the best plan is to expand your database for six to twelve months’ growth, or whatever seems feasible for the size of your data and the rate of growth compared to the budget and hardware available. Although it is administratively easier to let the system autogrow, there are two risks associated with this:

H. If you autogrow to the point that the disk fills beyond 80 percent, you will experience performance degradation as the disk fills beyond that.
I. If the disk fills completely or to the point that the database cannot grow by the allotted amount or percentage, the database will stop. If this occurs, the only option is to make more physical space available by adding storage:

If the data disk is configured as a dynamic disk, you can add more disks from your emergency spares and expand the array and the logical disk. If it is configured as a basic disk, such as in the case of a failover cluster, then you can add sufficient disks to make a new RAID set, and create another file on that drive and
add it into the database file group.

J. If the transaction logs disk fills completely or to the point where it cannot autogrow in the amount of space left, the database will stop. To rectify this, you must analyze whether you have an unusual problem that can be resolved by dumping and shrinking the transaction log

K. If you have multiple files in your filegroup and you add another one, you will need to expand each of them in order to re-establish proportional fill.

LOG FILE PLACEMENT (Important)

Here are a few tips and best practices regarding the placement of your log files:

· Create the transaction log on a physically separate disk or RAID array. The transaction log file is written sequentially; therefore, using a separate, dedicated disk allows the disk heads to stay in place for the next write operation. For this reason, smaller systems will do well by using a single mirrored disk for the transaction log. A single mirrored physical disk should support up to approximately 1,000 transactions per second, depending on the speed of the disk itself. Systems requiring more than that should stripe the transaction log across a RAID 0+1 array for maximum performance. For highest bandwidth, the RAID controller on this array should have a (battery-backed) write-back cache to speed log writes.

· Set your transaction log to autogrow, but try to size it so it should not need to grow. Base the optimal size on your recovery model, the level of logged activity in the database, and the interval of time between backups. Set the growth increment to a reasonable percentage, but try to anticipate when the log should be resized. If the transaction log expands too frequently or takes a long time to expand, performance can be affected.

· Base the size of the log on your current recovery model and your application design. If you find that you need to shrink the log periodically, investigate what is causing the log to fill up, in order to fix the problem rather than simply fixing the symptom.



TEMPDB FILE PLACEMENT

Here are a few tips and best practices regarding the placement of your tempdb files:

· Place the tempdb database on a fast I/O subsystem to ensure good performance. Stripe the tempdb database across multiple disks for better performance. Move the tempdb database to disks different from those used by user databases.

· The tempdb database can be located with the data in most situations. For larger systems that make heavy use of tempdb, consider putting tempdb on a set of disks by itself, to achieve extra performance. It is not a good idea to co-locate any database files with the page file of the operating system.


8. Monitoring for capacity management.
From the capacity point of view we must monitor following points and store the historic data and forecast accordingly.

· Disk drive capacities
· Database sizes and amount of free space inside each database
· Comparison of the database size to the disk drive space
· Rate of database growth
· Location (drives) of data files