Select Page

Max Server Memory and the 300 Second Rule

Author: Andy McDermid | | December 11, 2013

Up until recently the commonly understood and accepted ideal number for page-life-expectancy (PLE) for a SQL Server instance was 300 seconds (5 minutes) or greater. Recently, the Microsoft rule-of-thumb from the days of SQL 2000 and 4GB RAM is being rethought, some might say debunked.

Jonathan Kehayias offers a good explanation which says in summary, in these days of 16GB, 32GB, 64GB, and above RAM and correspondingly more available memory for the buffer pool, a PLE of only 300 seconds might indicate some significant resource usage and be a cause of concern.

Churning

In fact, even back when servers didn’t typically have as much RAM and instances had limited buffer pool sizes compared with today, the 300 second rule was of limited value as a stand-alone metric. Like many counters – from perfmon as well as metrics acquired from DMV queries – its meaning comes from its relation with other, associated metrics.

In the case of PLE, this performance counter value can be compared to the current buffer pool size to give an approximate rate of buffer pool churn (i.e. a measure of buffer pool data written to disk per second). For example, consider a 64bit SQL 2008 EE instance with 128GB RAM. Maximum memory for the instance is configured at 120GB and we can see that in the Target Server Memory performance counter. So to get MB of buffer pool writes per second we can use this equation:

BPool_Churn_in_MB_per_sec = (Target Server Memory KB/1024)/PLE

Here is a quick sample query to collect that data:

select Instancename, 
TargetServerMemoryKB/1024 'TargetServerMemoryMB',
TotalServerMemoryKB/1024 as 'TotalServerMemoryMB', 
PLE, (TargetServerMemoryKB/1024)/PLE as 'ChurnMB/sec'
from(
select @@SERVERNAME as 'Instancename',
max(case when counter_name = 'Target Server Memory (KB)' then cntr_value end) as 'TargetServerMemoryKB',
max(case when counter_name = 'Total Server Memory (KB)' then cntr_value end) as 'TotalServerMemoryKB',
max(case when counter_name = 'Page life expectancy'  then cntr_value end) as 'PLE'
from sys.dm_os_performance_counters) 
as p

(*note: PLE per NUMA node is not considered here)

So, given this example we would discover a Churn rate of ~400MB/sec which –depending, of course, on other factors – could be considered a fairly high rate of sustained disk activity due to disk writes from the buffer pool.

128GB is Real

Yes, servers with 128GB of RAM do exist. However, in my experience with the day-to-day workhorses in the SQL wild, 16GB, or the very common 8GB VM, or even 64GB of RAM is much more common than machines with 100GB+ RAM. So, in fact the 300 rule might still be considered a pretty good ballpark memory pressure indicator for servers in the 8GB-64GB range. But the point is not that a good portion of real world servers are 8-16GB VMs and so the 300 second rule is good enough, the point is that we can look deeper and with a churn rate metric we have a meaningful number on which to base configuration decisions. For one potential use scenario for this metric read on.

In the Wild

Most would agree that, in general, stacking SQL instance on the same server is not ideal. But due to poor design, consolidation efforts, security policies or a number of other reasons this design is out there and a number of SQL instances may be sharing the same server resources, including RAM.

To understand how the instances are sharing the server RAM, we can run the query above as a multi-server query so as to compare buffer pool churn. Here, utilizing the churn metric, we are beginning to develop a method to tune each instance’s minimum and maximum memory configurations for the best performance possible given the shared resource situation. Referring to the churn metric, we want to maximize available memory for each instance without compromising – too much – the memory available for the other instance(s).

To flesh out the query we can use instance name as a join condition and include a few other metrics of interest. Also, let’s drop the Churn MB units to KB (so we are more likely to get a non-zero number for qualitative comparison). Lastly, note we are using ‘Total Server Memory’ to calculate the Churn since – especially considering potential memory pressure from the other instances – this is the truer (though not perfect) representation of the size of the buffer pool.

;with cte as (
select @@SERVERNAME as 'Instancename',
max(case when name = 'min server memory (MB)' then value_in_use end) as 'MinServerMemoryMB',
max(case when name = 'max server memory (MB)' then value_in_use end) as 'MaxServerMemoryMB'
from sys.configurations)
select p.Instancename, 
os.PhysicalCPUCount, os.PhysicalMemoryMB,
c.MinServerMemoryMB,c.MaxServerMemoryMB,
p.TargetServerMemoryKB/1024 as 'TargetServerMemoryMB',
p.TotalServerMemoryKB/1024 as 'TotalServerMemoryMB',
p.PLE , (p.TotalServerMemoryKB)/p.PLE as 'ChurnKB/sec'
from(
select @@SERVERNAME as 'Instancename',
max(case when counter_name = 'Target Server Memory (KB)' then cntr_value end) as 'TargetServerMemoryKB',
max(case when counter_name = 'Total Server Memory (KB)' then cntr_value end) as 'TotalServerMemoryKB',
max(case when counter_name = 'Page life expectancy'  then cntr_value end) as 'PLE'
from sys.dm_os_performance_counters) 
as p
join cte c on p.instancename = c.instancename
join
(SELECT @@SERVERNAME as 'Instancename', cpu_count AS 'LogicalCPUCount', hyperthread_ratio AS 'HyperthreadRatio',
cpu_count/hyperthread_ratio AS 'PhysicalCPUCount', 
physical_memory_in_bytes/1048576 AS 'PhysicalMemoryMB' 
FROM sys.dm_os_sys_info ) as os
on c.instancename=os.instancename

Here are a couple example results from servers hosting 3 instances:

Server 1:

Instance CPUs MemMB MinMemMB MaxMemMB TargetSMemMB TotalMemMB PLE ChurnKB/sec
Instance1 8 16383 4600 4600 4600 4600 398486 11
Instance2 8 16383 4600 4600 4600 4600 25544 184
Instance3 8 16383 4600 4600 4600 4600 2672 1762

Server 2:

Instance CPUs MemMB MinMemMB MaxMemMB TargetMemMB TotalMemMB PLE ChurnKB/sec
InstanceA 4 16383 1024 2048 2048 2048 86002 24
InstanceB 4 16383 2048 8192 8192 8192 379 22133
InstanceC 4 16383 1024 1024 1024 1024 71931 14

Just quickly scanning these results one might consider reconfiguring the even split of memory on Server1’s instances. InstanceA max memory can perhaps be reduced and the more heavily churned Instance3 max memory increased. On the other hand, on Server2 one might argue that even though Instances A and C have a max memory setting that is tuned quite low and InstanceB is getting the greater slice, InstanceB is still reporting a high relative churn (note the PLE is approaching the 300 mark). Perhaps the server needs more RAM or InstanceB should have a dedicated server.

Decisions, Decisions

Granted these are only snapshot values. These metrics can change often and may vary widely over a day, a week, or a month. A better, more interesting evaluation would include results from these counters, as well as a few others, over some range of time and as the configurations are adjusted – but I’ll save that for another post.

To conclude, the 300 rule turns out to be of some use, but just looking at PLE on these instances we might not be able to come to a reasonable judgment on how to distribute the RAM resource. It is only possible by looking in-depth beyond the 300 second rule to consider the churn rate and, more generally, all the pertinent info; not just PLE, but including (most importantly) buffer pool size and (most specifically) the current buffer pool size in use, as well as minmax server memory and available RAM and CPU count. We need this whole picture presented before us so we can compare, contrast and decide.

How to Solve the Oracle Error ORA-12154: TNS:could not resolve the connect identifier specified

The “ORA-12154: TNS Oracle error message is very common for database administrators. Learn how to diagnose & resolve this common issue here today.

Vijay Muthu | February 4, 2021

How to Recover a Table from an Oracle 12c RMAN Backup

Our database experts explain how to recover and restore a table from an Oracle 12c RMAN Backup with this step-by-step blog. Read more.

Megan Elphingstone | February 2, 2017

Data Types: The Importance of Choosing the Correct Data Type

Most DBAs have struggled with the pros and cons of choosing one data type over another. This blog post discusses different situations.

Craig Mullins | October 11, 2017

Subscribe to Our Blog

Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.

CONTACT US

Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.

EXPLORE JOBS