This post is based on a document I wrote a long time ago as part of my DBA knowledge transfer to other DBAs. It’s also goes hand in hand with a question I like to ask in interviews:
‘Someone has raised a ticket with your IT Service Desk and assigned it to you. It simply states ‘The database isn’t working’, what is your logic/method to try and narrow down where the problem is and what the problem may be?
There’s no right or wrong answer to this but it gives me a good idea of how someone thinks about an issue and whether they have a set-based routine to investigation or more of a scattergun approach.
Troubleshooting Performance Problems
Performance can depend on many factors, both Hardware and Software. The first step is to diagnose where the problem stems from which you can do by monitoring your hardware and databases. Most problems tend to be caused by CPU, Memory, Network, I/O bottlenecks and slow running queries.
I would first run Windows Performance Monitor (or a similar tool) to help identify where the problem lies. The problem might not actually be with the databases themselves so it’s best practice to check your hardware for any other issues. With Virtual servers, I find this particularly important. My servers have regularly changed VM host where the CPU and Memory resources available have been different to our recommended specifications and therefore have affected performance. I check the following performance counters for prolonged high values which would normally indicate a bottleneck:
- PhysicalDisk: AVG. disk queue length – These are the number of I/O operations waiting to be read or written from disk. You should expect this value never to peak above 2.
- Average Disk sec/read – average time in seconds needed to read data from a disk. Under 8ms is excellent performance. Anything above 20ms indicates I/O issues
- Average Disk sec/Write – average time in seconds to write data to disk. Performance below 1ms is excellent. Anything above 4ms indicates I/O issues.
- Memory: available Mbytes – amount of available memory on the server
- Paging File: % usage – Captures file usage and ideally should be a low value. – the higher the value, the less memory is available.
- Network Interface: Bytes Total/Sec – total number of bytes sent and received over a network connection for each second
- Network Interface: Current Bandwidth – records actual capacity of a network card.
- Processor: % Processor Time – Indicates the % of time that the processor spends processing workloads. If value is consistently above 80%, the CPUs could represent a bottleneck in the system.
- System Processor Queue Length – indicates the number of threads that are waiting for CPUs to become available so they can be processed.
In conjunction with these Server Performance Counters, I also use the following SQL Counters:
- SQL Server: Buffer Manager: Page reads/sec and SQL server: Buffer Manager: Page writes/sec – Both these counters monitor SQL performance in reading pages from disk or writing pages to a disk. If the values of these counters reaches capacity for the hardware you can reduce the values by tuning your application or database with Index coverage, better indexes, defragging indexes or normalization). You can also increase the I/O capacity or add more memory.
- SQL Server: Buffer Manager: Lazy writes/sec – If this value is consistently high and above 20 and page life expectancy is below 300 seconds then you may have a memory issues.
- SQL Server SQL Statistics: SQL compilations/sec and SQL Server SQL Statistics: SQL Re-compilations/sec. You can use these counter to track the number of times SQL Server compiles and recompiles execution plans. Compiling an execution plan can be resource-intensive so you typically want to see a small number of compilations and recompilations. You can compare the SQL Server Statistics: SQL Compilations/sec counter against the SQL Server SQL Statistics: Batch Requests/sec counter to see how many of the batches that are submitted to the server require a compilation. The number of recompilations should be significantly lower than the number of compilations, ideally about 10%. If this figure is significantly higher, you should investigate the cause of the recompilations.
Once diagnosing the problem to be with a Database or SQL Instance, you can then start looking under the hood for the problem:
- SQL Agent – Check to see if any SQL Jobs are running – many times we have had maintenance jobs still running from the night before which are causing system problems
- SQL Error Log – check to see if any errors have appeared here that may point to where the problem lies
- SQL Activity Monitor – Check Processes for all current sessions, Waiting tasks for your wait statistics, Data File I/O to see which files/databases may be under pressure or using all resources, and then check to see if any recent expensive queries may be causing an issue (also check query plans here). If SQL is under pressure this tool will not always work!
- Check for Blocks, Locks and Waits – Use SP_Who2 (or even better, SP_WHOISACTIVE) to identify the sessions currently running, their current status and if any blocking is happening and on what database (as well as CPU and IO times). SP_LOCK will tell you the number, lock types (row, table, schema) and lock modes (shared, update, exclusive, intent, bulk) – Simply counting the number of locks and types for a SPID can point to a poorly performing query which may not even be blocking other processes but can still cause major performance issues.
Waits Reference Guide: https://www.sqlskills.com/help/waits
Dynamic Management Views
Example SQLs using the DMVs
SYS.dm_exec_requests – detailed information about requests currently executing on SQL Server. It gives you information about the query, query plan, status, isolation level, and lock_timeout. You can run a CROSS APPLY with sys.DM_exec_sql_text to find the actual SQL statements running.
Example (currently executing quries):
SELECT s.session_id, S.login_name, S.host_name, S.program_name, R.command, T.text, R.wait_type, R.wait_time, R.blocking_session_id FROM sys.dm_exec_requests AS R INNER JOIN sys.dm_exec_sessions AS S ON R.session_id = S.session_id OUTER APPLY sys.dm_exec_sql_text(R.sql_handle) AS T WHERE S.is_user_process = 1;
SYS.dm_tran_locks – to view information about current locks and processes blocking them. Request_status is key – Look for rows with the status of CONVERT as the requester has been granted a request but is waiting to upgrade to the initial request to be granted.
Example (locks per database):
USE [master] GO SELECT tl.resource_type ,req.blocking_session_id ,tl.resource_associated_entity_id ,tl.request_status ,tl.request_mode ,tl.request_session_id ,tl.resource_description ,db.name AS 'Database Name' ,ex_ses.status FROM sys.dm_tran_locks tl INNER JOIN sys.databases AS db on tl.resource_database_id = db.database_id INNER JOIN sys.dm_exec_requests as req ON tl.request_session_id= req.session_id INNER JOIN sys.dm_exec_sessions ex_ses ON tl.request_session_id = ex_ses.session_id WHERE tl.resource_database_id > 4 -- view user databases only GO
Sys.dm_os_waiting_tasks – Reports information about blocked and blocking processes. – It gives you session ids, waiting durations, cpu and memory usage, blocking sessions. Run this in conjunction with sys.dm_exec_requests and sys.dm_exec_sql text to view wait stats for all blocked processes. If you simply run a query on sys.dm_os_waiting_tasks, the values should be constantly changing, each time you run it. If they are not, then you have most likely got a bottleneck somewhere.
Sys.dm_exec_sessions – shows information about all active user connections and internal tasks. This information includes client version, client program name, client login time, login user, current session setting, and more. Use sys.dm_exec_sessions to first view the current system load and to identify a session of interest.
Example (number of sessions per user):
SELECT login_name ,COUNT(session_id) AS session_count FROM sys.dm_exec_sessions GROUP BY login_name;
sys.dm_tran_session_transactions – lists out the open transactions for a single session. Use in conjunction with sys.dm_tran_database_transactions.
Example (Idle Sessions with open transactions):
SELECT s.* FROM sys.dm_exec_sessions AS s WHERE EXISTS ( SELECT * FROM sys.dm_tran_session_transactions AS t WHERE t.session_id = s.session_id ) AND NOT EXISTS ( SELECT * FROM sys.dm_exec_requests AS r WHERE r.session_id = s.session_id );
sys.dm_exec_connections – Returns information about the connections established to this instance of SQL Server and the details of each connection.
SELECT c.session_id, c.net_transport, c.encrypt_option, c.auth_scheme, s.host_name, s.program_name, s.client_interface_name, s.login_name, s.nt_domain, s.nt_user_name, s.original_login_name, c.connect_time, s.login_time FROM sys.dm_exec_connections AS c JOIN sys.dm_exec_sessions AS s ON c.session_id = s.session_id
SQL Profiler (now deprecated) – This is still a favourite for a lot of DBAs although I prefer not to run it on production environments because it can cause a performance hit itself. There are plenty of of other posts on how to set up profiler traces but I don’t want to encourage the use of a deprecated product!
Extended Events – This has got much better over the years and was the replacement for profiler. I’ve written a previous post on Extended Events Sessions that I use.
Milena Petrovic – SQL Server Performance Troubleshooting
Brent Ozar – SQL Server Perfmon Counters Tutorial
MSSQLTips – Performance Monitor
SentryOne – SQL Server Performance Counters to Monitor