All things are possible with coffee, and sufficient cores.
All things are possible with coffee, and sufficient cores.
I’m British, and we’re used to waiting in line for things, but why is my SQL Server waiting?
This is a question that comes up a lot, and is a lesson 101 problem for any Database Admin. We know that SQL Server waits, but is this bad? Should we be doing something, or buying something? What do these ‘waits’ really mean, and how do I get rid of them?
Well, let’s take a breath and not panic. SQL Server waits for many reasons, and they’re not all something to worry about! In fact, many of them can be ignored all together, and if you’re weren’t seeing any waits whatsoever, then you’d probably be better off saving yourself some cash and sticking your data in an Excel spreadsheet.
But no one likes waiting, and reducing these waits is good, so let’s examine SQL Server Waits in more detail. We’ll start by talking about what we mean by a ‘wait’, then go on to talk about why they occur, before looking at some useful scripts to help us identify them. Finally, we’ll talk about what actions we could take to reduce these waits.
What is a Wait?
SQL Server executes your queries using something known as a ‘worker thread’. Put simply – if one of these threads wants to execute, but a required resource is not available, it must wait! Any active threads that are not currently executing, are classed as ‘waiting’, and of course if a thread is waiting, our query will take longer to complete.
When these threads ‘wait’, SQL Server logs lots of super-useful information, such as how long it’s been waiting, and why it’s waiting (known as the wait ‘type’). Once we understand it, we can use this information to make informed decisions on how to reduce these wait times.
What can cause waits?
Let’s think about what happens when SQL Server executes a query
So considering the above, let’s think about what might cause SQL Server to wait:
Those old spindly disks just can’t keep up sometimes! If SQL Server must read the pages it needs from disk, your thread is going to relinquish its CPU time and wait until those data pages are loaded into memory.
SQL Server manages access to the data pages by a series of locks. For example, by default your SELECT statement will need to place a SHARED lock of some sort on the required data pages to ensure they’re not modified by any other transaction until its complete. However, the execution thread could be blocked from achieving this by another thread, which is holding an incompatible lock. This behaviour is critical to maintaining the integrity of the database transactions, and guess what? If your thread can’t acquire the locks it needs, it’s going to have to sit there and wait.
Finally, of course your thread needs time on the CPU to be executed. If a thousand other requests also need executing, your query may just have to wait in line for its turn (or more specifically – a queue – more on this below)!
How SQL Server manages a waiting thread
Whenever a thread is ‘active’, SQL Server holds it in one of three states; Runnable, Running or Suspended. Throughout its lifecycle, your thread will likely pass through all 3 of these states at least once.
It will come as no surprise that ‘Running’ means your thread is being executed, and the other two mean it isn’t! Both ‘Runnable’ and ‘Suspended’ are ‘waiting’ states, and as mentioned earlier, SQL Server will track the time your thread spends in this state.
You may find that your thread sits in one of the ‘waiting’ states for a prolonged period, and you need to know why! Let’s examine each state in more detail.
Threads in the runnable state are not waiting on anything, other than CPU time. They’re sat in a queue, and will be dealt with on a First-In-First-Out (FIFO) basis.
Seeing threads in the runnable state is completely normal on even a moderately busy system. However, if threads are staying in this state for long periods, this is more likely a sign of CPU pressure. A CPU with 32 logical cores can only execute 32 threads at any one time, so anything beyond that is going to cause a wait!
As mentioned, the Runnable queue is a FIFO queue, so the thread at the ‘head’ of this queue will eventually get some CPU time, and move into the Running state. Your query is being executed! Woohoo! However…
When your thread is running on the CPU, it doesn’t get to just run and run until it’s done. It’s actually given just 4ms (yes, that’s milliseconds!) of CPU time before it’s force to relinquish control. This could be even shorter if it needs a resource at some point that it has to wait for (such as a lock, or disk I/O). Once that happens, the thread moves onto the Suspended list (otherwise known as the ‘waiter’ list).
Note that this is a list, as opposed to a queue – there’s no knowing when your resource will be available, and your thread must wait until it is!
If you consistently see high numbers of threads in the Suspended state then you’ve got contention issues on a resource other than CPU (more on this below).
Once the resource is available, your thread will move back onto the Runnable queue, and the cycle continues…
Suspended -> Runnable -> Running (4ms) -> Suspended -> Runnable – Running (4ms)-> etc…
How do I find out why my SQL Server is waiting?
So now we understand waits, let’s examine how we can identify what types of waits our server is suffering from.
This is where SQL Server itself can help us. We can check the current state of our thread by checking the following dynamic management view (DMV) while our query is executing:[code language=”sql”]SELECT * FROM sys.dm_exec_requests where session_id > 50[/code]
(note – session_id 50 or below are internal processes and can generally be ignored for this purpose)
That will give us some useful information, but it usually shows us an ever-changing picture, and doesn’t really give us enough to start troubleshooting. There are a couple of other DMVs for diagnosing wait stats which you should be aware of, and when the 3 are used in combination, they become extremely useful:
This DMV returns information about all the waits encountered by threads that have executed. It gives an historical view of the data (since the server was started, or stats were purged), rather than a current view.
This DMV returns information about what tasks are in the waiter list (Suspended) now i.e. waiting on some resource. It gives a current view of the data, and therefore is most useful when the system is currently experiencing blocking issues.
There are a tonne of scripts out there, written by some very clever members of the SQL Server community, which do a lot of the work for you and really take the pain out of understanding what on earth is going on! I’m not here to state which ones you should and shouldn’t use, but these are the ones that I keep in my locker:
Paul Randal – Tell me where it hurts!
This is a very useful script which wraps some aggregation code around dm_os_wait_stats and shows you the cumulative wait stats for your server.
Pinal Dave – Current running queries
Pinal Dave’s blog is a treasure-trove of useful scripts. This particular script is actually very simple – it’s an expanded version of sys.dm_exec_requests that throws in the actual query text for good measure, so you can see exactly what queries are currently being executed on your server.
Pedro Lopes – SQL Swiss Army Knife – Locking, blocking and active transactions
As the names suggests, this script gives you pretty much everything you need for diagnosing current waits, and it does this by combining most of the key DMVs. It will show you what’s running right now, what’s waiting, what’s being blocked, and lots of other useful information.
Brent Ozar – Triage Wait Stats in SQL Server
This script gives you the best of both worlds – it will tell you the overall wait times on your server since it was started, and then give you a snapshot of your current waits. Brent’s blog also includes some links to sp_WhoIsActive from Adam Machanic, which I would definitely recommend you check out.
Interpreting wait stats – Perfmon is your friend
Fixing waits is rarely a case of just running a DMV and saying “Ah yes, there’s the problem”, however we can typically draw some reasonable conclusions from what we find.
As mentioned, if threads are remaining in the Runnable state for a long time, then you’ve likely got CPU pressure. Running a Perfmon trace and monitoring the System – Processor Queue Length and Processor – % Time counters will help confirm this. Also, look at the signal_wait_time_ms column from sys.dm_os_wait_stats – this will tell you how much of the cumulative wait time was spent solely on waiting for CPU.
If reducing the workload on your server is not an option (it rarely is), then upgrading your CPUs might be your only choice. Remember, however, that waiting is normal. So don’t start filling out purchase requests unless they’re actually causing you a problem.
Threads that are Suspended are waiting on something other than CPU. Don’t make the mistake of instantly thinking, “OK, I need faster disks”, or “I need to buy more memory”. We usually have to introduce a further level of monitoring to come to these conclusions with any certainty, and for this I usually turn to Perfmon again. Brent Ozar has written a great blog on the best SQL Server Perfmon counters to analyze, and how to draw the right conclusions.
Waits are normal, and necessary, but should not be ignored. Navigating your way through the maze of information and identifying where improvements should be made can be somewhat challenging. It’s important to take a holistic look at what’s occurring on your server before jumping to any conclusions. The starting point is to just ask SQL Server – hey, what are you waiting for?