Max Server Memory and the 300 Second Rule

By | In Blog, SQL Server | December 11th, 2013

Up until recently the commonly understood and accepted ideal number for page-life-expectancy (PLE) for a SQL Server instance was 300 seconds (5 minutes) or greater. Recently, the Microsoft rule-of-thumb from the days of SQL 2000 and 4GB RAM is being rethought, some might say debunked.

Jonathan Kehayias offers a good explanation which says in summary, in these days of 16GB, 32GB, 64GB, and above RAM and correspondingly more available memory for the buffer pool, a PLE of only 300 seconds might indicate some significant resource usage and be a cause of concern.

Churning

In fact, even back when servers didn’t typically have as much RAM and instances had limited buffer pool sizes compared with today, the 300 second rule was of limited value as a stand-alone metric. Like many counters – from perfmon as well as metrics acquired from DMV queries – its meaning comes from its relation with other, associated metrics.

In the case of PLE, this performance counter value can be compared to the current buffer pool size to give an approximate rate of buffer pool churn (i.e. a measure of buffer pool data written to disk per second). For example, consider a 64bit SQL 2008 EE instance with 128GB RAM. Maximum memory for the instance is configured at 120GB and we can see that in the Target Server Memory performance counter. So to get MB of buffer pool writes per second we can use this equation:

BPool_Churn_in_MB_per_sec = (Target Server Memory KB/1024)/PLE

Here is a quick sample query to collect that data:

(*note: PLE per NUMA node is not considered here)

So, given this example we would discover a Churn rate of ~400MB/sec which –depending, of course, on other factors – could be considered a fairly high rate of sustained disk activity due to disk writes from the buffer pool.

128GB is Real

Yes, servers with 128GB of RAM do exist. However, in my experience with the day-to-day workhorses in the SQL wild, 16GB, or the very common 8GB VM, or even 64GB of RAM is much more common than machines with 100GB+ RAM. So, in fact the 300 rule might still be considered a pretty good ballpark memory pressure indicator for servers in the 8GB-64GB range. But the point is not that a good portion of real world servers are 8-16GB VMs and so the 300 second rule is good enough, the point is that we can look deeper and with a churn rate metric we have a meaningful number on which to base configuration decisions. For one potential use scenario for this metric read on.

In the Wild

Most would agree that, in general, stacking SQL instance on the same server is not ideal. But due to poor design, consolidation efforts, security policies or a number of other reasons this design is out there and a number of SQL instances may be sharing the same server resources, including RAM.

To understand how the instances are sharing the server RAM, we can run the query above as a multi-server query so as to compare buffer pool churn. Here, utilizing the churn metric, we are beginning to develop a method to tune each instance’s minimum and maximum memory configurations for the best performance possible given the shared resource situation. Referring to the churn metric, we want to maximize available memory for each instance without compromising – too much – the memory available for the other instance(s).

To flesh out the query we can use instance name as a join condition and include a few other metrics of interest. Also, let’s drop the Churn MB units to KB (so we are more likely to get a non-zero number for qualitative comparison). Lastly, note we are using ‘Total Server Memory’ to calculate the Churn since – especially considering potential memory pressure from the other instances – this is the truer (though not perfect) representation of the size of the buffer pool.

Here are a couple example results from servers hosting 3 instances:

Server 1:

Instance CPUs MemMB MinMemMB MaxMemMB TargetSMemMB TotalMemMB PLE ChurnKB/sec
Instance1 8 16383 4600 4600 4600 4600 398486 11
Instance2 8 16383 4600 4600 4600 4600 25544 184
Instance3 8 16383 4600 4600 4600 4600 2672 1762

Server 2:

Instance CPUs MemMB MinMemMB MaxMemMB TargetMemMB TotalMemMB PLE ChurnKB/sec
InstanceA 4 16383 1024 2048 2048 2048 86002 24
InstanceB 4 16383 2048 8192 8192 8192 379 22133
InstanceC 4 16383 1024 1024 1024 1024 71931 14

Just quickly scanning these results one might consider reconfiguring the even split of memory on Server1’s instances. InstanceA max memory can perhaps be reduced and the more heavily churned Instance3 max memory increased. On the other hand, on Server2 one might argue that even though Instances A and C have a max memory setting that is tuned quite low and InstanceB is getting the greater slice, InstanceB is still reporting a high relative churn (note the PLE is approaching the 300 mark). Perhaps the server needs more RAM or InstanceB should have a dedicated server.

Decisions, Decisions

Granted these are only snapshot values. These metrics can change often and may vary widely over a day, a week, or a month. A better, more interesting evaluation would include results from these counters, as well as a few others, over some range of time and as the configurations are adjusted – but I’ll save that for another post.

To conclude, the 300 rule turns out to be of some use, but just looking at PLE on these instances we might not be able to come to a reasonable judgment on how to distribute the RAM resource. It is only possible by looking in-depth beyond the 300 second rule to consider the churn rate and, more generally, all the pertinent info; not just PLE, but including (most importantly) buffer pool size and (most specifically) the current buffer pool size in use, as well as minmax server memory and available RAM and CPU count. We need this whole picture presented before us so we can compare, contrast and decide.

Contact Us
Andy McDermid
Principal SQL Server DBA
Andy is a MCITP certified MS SQL DBA who delivers and manages delivery of DBA services to many diverse clients. He enjoys helping his clients by finding and deploying pragmatic and practical solutions for their database issues. Andy is always working to improve and expand his DBA skills and he likes to share the experience via writing.

Leave a Reply

Your email address will not be published.
Required fields are marked (*).

4 thoughts on “Max Server Memory and the 300 Second Rule”
    1. Hi Dinesh,

      In general. to increase PLE increase the size of the buffer pool via the max memory setting. Adjusting the max memory setting to optimize PLE across multiple instances is what is discussed in this post. If increasing max memory is not possible, look into what queries might be using large portions of the buffer pool and thereby forcing out other data. Tune these queries or index for them so as to reduce their buffer pool usage and PLE should increase accordingly.

  1. Hi – check out all the ‘_logical_reads’ columns in sys.dm_exec_query_stats. Since each logical reads represent a page read from the buffer pool, the queries with the highest logical reads at a given time are also using the most space in the buffer pool.

    If, on the other hand, you are interested in which queries have requested the most memory to carry through with their execution plans check out sys.dm_exec_query_memory_grants which has interesting columns like granted_memory_kb.
    Both of these DMVs can be cross applied to sys.dm_exec_sql_text andor sys.dm_exec_query_plan to return the sql text andor query plan.
    You’ll find many good examples on using these DMVs online.
    Thanks for your question.