Description:
Troubleshooting performance problems OR Test Performance on a database system can be an overwhelming task.
The reasons for poor performance can range from a poorly designed database to a system that is improperly configured for the workload.
Check the table below for your Performance Counter reference during Troubleshooting OR Testing (all purple objects I use within Performance Testing):
SQL Server: Counters to Watch |
|||
Object |
Counter |
Preferred Value |
Description |
Processor: |
Processor Time |
< 80% |
Percentage of elapsed time the processor spends executing non-idle threads. |
Process (sqlservr): |
Processor Time |
< 70% - 80% |
Percentage of processor time spent on SQL Server process threads. You may also wish to investigate other Process (sqlservr) such as Private Bytes, Virtual Bytes, Working Set, etc to get a fuller understanding of how SQL Server allocates certain segments of memory. Usually, these auxiliary counters provide contextual information and are not necessary for troubleshooting. |
|
Process ID |
.. |
Multiple SQL Server Process, Use this ID to determine the Processor Time for each Process. |
System: |
Processor Queue Length |
< 4 per CPU |
Number of threads waiting for CPU cycles, where < 12 per CPU is good/fair, < 8 is better, < 4 is best. |
|
Context Switches/sec |
... |
Number of execution contexts switched in the last second, where >6000 is poor, <3000 is good, and <1500 is excellent. |
Memory: |
Available MBytes |
> 100MB |
Represents the available memory in megabytes remaining on your server. This counter sums the amount of space on the zeroed, free, and standby memory lists. The zeroed memory is the memory that is filled with zeros to prevent other processes from viewing it. Free memory is the memory ready for use by a process. Standby memory is the memory that was removed from the processes and is waiting to go to disk. This counter represents the last observed value, not an average value, so monitor this value over time to ensure that you don't have minimal memory remaining. If the server that runs SQL Server has multiple applications running on it, you may want to consider moving those applications if your memory is consistently less than the threshold you would like it to be. Monitor this threshold closely because SQL Server does not perform well when it does not have enough memory. |
|
Pages Input/Sec |
< 10 Pages / sec. |
Tracks the rate of pages read from disk to resolve hard page faults. Ideally, you do not want more than 10 pages per second read into memory to resolve hard page faults. This counter is extremely useful when determining the number of page faults/sec that represent hard page faults. Pages input per second / page faults per second = percentage hard page faults. When the hard page faults percentage exceeds 40% for an extended period of time, the chances are your system has memory issues. |
|
Pages/Sec |
< 50 (on average) |
Shows the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\ Pages Input/sec and Memory\ Pages Output/sec. The average Pages/sec value should be below 50. Its not possible to reduce the value to zero, as moving pages from memory and to memory always occurs while the operating system is running. The more memory the server has, fewer pages have to be written and read due to page faults |
|
Page Faults/Sec |
Use your baseline |
Represents the rate of page faults per second that are occurring on your system. This counter tracks both hard and soft page faults. Hard page faults are faults that have to go to disk in order to retrieve the requested data, while soft page faults retrieve the requested data in physical memory. Because this value changes from system to system, use your baseline to determine when the page faults rate deviates from the norm. Do not be alarmed if you have a large number of page faults. Your system can probably handle them without issue; however, pay close attention to the number of hard page faults. Monitor your hard page faults along with the physical IO counters to ensure that the hard page faults are not causing disk IO problems. |
Paging File: |
% Usage |
< 70 |
Amount of Page File in use, which indicates the server is substituting disk space for memory. |
|
% Usage Peak |
< 70 |
Highest %Usage metric since the last time the server was restarted. |
PhysicalDisk: |
Avg. Disk Queue Length |
Frequently Exceeds 2 |
Shows you the average number of read and write requests that were queued on the selected physical disk. The higher the number the more disk operations are waiting. It requires attention if this value frequently exceeds a value of 2 during peak usage of SQL Server. If you have multiple drives you should take this number and divide by the number of drives in the array to see if the number is above 2. For example, you have 4 drives and a disk queue length of 10, this would be 10/4 = 2.5, so this would be the value you want to use not 10. |
|
Avg. Disk Sec/Read |
< 20ms |
Shows you the average number of read and write requests that were queued on the selected physical disk. shows the average time of a data reads from/to the disk. It is good up to 10 ms, but it is still acceptable if less than 20 ms. Any higher value needs further investigation. |
|
Avg. Disk Sec/Write |
< 20ms |
Shows you the average number of read and write requests that were queued on the selected physical disk. shows the average time of a data writes from/to the disk. It is good up to 10 ms, but it is still acceptable if less than 20 ms. Any higher value needs further investigation. |
|
Percentage Disk Time |
< 50% |
The ratio of elapsed time when the disk drive was busy with read or write requests. |
SQL Server Buffer Manager: |
Buffer Cache Hit Ratio |
> 90 (closer to 99%) |
This SQL Server Buffer Cache Hit Ratio counter indicates how often SQL Server goes to the buffer, not the hard disk, to get data. In OLTP applications, this ratio should exceed 90%, and ideally be over 99%.If your buffer cache hit ratio is lower than 90%, you should consider adding RAM. If the ratio is between 90% and 99%, then you should consider adding more RAM, as the closer you get to 99%, the faster your SQL Server will perform. In some cases, if your database is very large, you may not be able to get close to 99%, even if you put the maximum amount of RAM in your server. All you can do is add as much as you can, and then live with the consequences. |
|
Checkpoint Pages/Sec |
Use your baseline |
The number of pages per second flushed to the disk by check pointing or other methods. |
|
Page Life Expectancy |
> 300 |
The time in seconds before the average data page is removed from the cache buffer. If the average page life falls below 300 seconds, this may indicate that your SQL server may require more RAM to improve performance. |
|
Lazy Writes/Sec |
< 20 |
Number of buffers written per second by the lazy writer system process. The lazy writer flushes out old, dirty buffer cache pages to make them available for reuse. If the value of this counter is higher than 20, then the server could use additional RAM. |
|
Page reads/sec |
< 80/90 |
Number of physical database page reads issued. 80 / 90 per second is normal; anything that is above indicates indexing or memory constraint. Values for this counter will vary between database applications, but this information is useful when determining if SQL Server is the primary application using the disk. If the Buffer Manager page read-writes are low but disk-queue lengths are high, there might be a disk bottleneck. If the Page read-writes are higher than normal, a memory shortage is likely to exist. |
|
Page writes/sec |
< 80/90 |
Number of physical database page writes issued. 80 / 90 per second is normal, anything more we need to check the lazy writer/sec and checkpoint counters, if these counters are also relatively high then, it is a memory constraint. Values for this counter will vary between database applications, but this information is useful when determining if SQL Server is the primary application using the disk. If the Buffer Manager page read-writes are low but disk-queue lengths are high, there might be a disk bottleneck. If the Page read-writes are higher than normal, a memory shortage is likely to exist. |
SQL Server Memory Manager: |
Memory Grants Pending |
Should be close to zero |
Total number of processes waiting to acquire a workspace memory grant. |
|
Memory Grants Outstanding |
|
The number of processes that have successfully acquired a workspace memory grant. |
|
Target Server Memory |
High or rising values indicate memory pressure |
The amount of memory that SQL Server would like to have. |
|
Total Server Memory |
Server RAM |
The Total Server Memory is the current amount of memory that SQL Server is using. If this counter is still growing, the server has not yet reached its steady-state, and it is still trying to populate the cache and get pages loaded into memory. Performance will likely be somewhat slower during this time since more disk I/O is required at this stage. This behavior is normal. Eventually Total Server Memory should approximate Target Server Memory. |
SQL Server General Statistics: |
User Connections |
... |
The number of users currently connected to the SQL Server. Note: It is recommended to review this counter along with Batch Requests/Sec. A surge in user connections may result in a surge of Batch Requests/Sec. So if there is a disparity (one going up and the other staying flat or going down), then that may be a cause for concern. With a blocking problem, for example, you might see user connections, lock waits, and lock wait time all increase, while batch requests/sec decrease. |
|
Logins/sec |
... |
Total number of logins started per second. This does not include pooled connections. |
|
Temp Tables Creation Rate |
... |
Number of temporary tables/table variables created per second. |
SQL Server SQL Statistics: |
Batch Requests/sec |
... |
This counter measures the number of batch requests that SQL Server receives per second, and generally follows in step to how busy your server's CPUs are. Generally speaking, over 1000 batch requests per second indicates a very busy SQL Server, and could mean that if you are not already experiencing a CPU bottleneck, that you may be experiencing one soon. Of course, this is a relative number, and the bigger your hardware, the more batch requests per second SQL Server can handle. From a network bottleneck approach, a typical 100Mbs network card is only able to handle about 3000 batch requests per second. If you have a server that is this busy, you may need to have two or more network cards, or go to a 1Gbs network card. |
|
SQL Compilations/sec |
... |
The number of SQL compilations that occur each second. Values higher than 100 indicate a high proportion of adhoc queries and may be using up valuable CPU time. Solutions include rewriting adhoc queries as stored procedures and using sp_executeSQL. This value should be as low as possible, preferably under 10% of the Batch Requests/sec. |
|
SQL Re-Compilations/sec |
... |
Number of statement recompiles per second. Counts the number of times statement recompiles are triggered. Generally, you want the recompiles to be low. |
SQL Server Latches: |
Latch Waits/sec |
... |
This is the number of latch requests that could not be granted immediately. In other words, these are the amount of latches in a one second period that had to wait. Latches are light-weight synchronization constructs that are designed to protect the physical integrity of a page in a similar way to how locks protect the logical consistency of rows. They are taken any time something wants to modify a page, be it moving the page from disk to memory or via versa, writing a record onto a page, or changing a page's metadata. |
SQL Server Locks: |
Lock Waits/sec |
... |
This counter reports how many times users waited to acquire a lock over the past second. Note that while you are actually waiting on the lock, this is not reflected in this counter. It gets incremented only when you ?wake up? after waiting on the lock. If this value is nonzero, then it is an indication that there is at least some level of blocking occurring. If you combine this with the Lock Wait Time counter, you can get some idea of how long the blocking lasted. A zero value for this counter can definitively rule out blocking as a potential cause; a nonzero value will require looking at other information to determine whether it is significant. |
|
Lock Timeouts (timeout > 0)/sec |
... |
Number of lock requests per second that timed out, but excluding requests for NOWAIT locks. Lock:Timeout events with a duration of 0 are commonly the results of internal locks probes and are not necessarily an indication of a problem. This parameter can be used to ignore time-outs with a duration of 0. The following setting will influence the Lock Timeout: @@LOCK_TIMEOUT: Is the number of milliseconds that will pass before Microsoft SQL Server returns a locking error. A value of -1 (default) indicates no time-out period (that is, wait forever). |
|
Deadlocks/secc |
< 1 |
The number of lock requests that end up in a deadlock per second. This value should less than 1. |
|
Average Wait Time (ms) |
< 500ms |
The average wait time in milliseconds of each lock request that had a wait time. This value should be kept under 500ms. Wait times over 500ms may indicate blocking. |
SQL Server Errors: |
Errors/sec (User Errors) |
~0 |
Information about user errors. |