Select Page

Max Server Memory and the 300 Second Rule

Andy McDermid | | December 11, 2013

Up until recently the commonly understood and accepted ideal number for page-life-expectancy (PLE) for a SQL Server instance was 300 seconds (5 minutes) or greater. Recently, the Microsoft rule-of-thumb from the days of SQL 2000 and 4GB RAM is being rethought, some might say debunked.

Jonathan Kehayias offers a good explanation which says in summary, in these days of 16GB, 32GB, 64GB, and above RAM and correspondingly more available memory for the buffer pool, a PLE of only 300 seconds might indicate some significant resource usage and be a cause of concern.

Churning

In fact, even back when servers didn’t typically have as much RAM and instances had limited buffer pool sizes compared with today, the 300 second rule was of limited value as a stand-alone metric. Like many counters – from perfmon as well as metrics acquired from DMV queries – its meaning comes from its relation with other, associated metrics.

In the case of PLE, this performance counter value can be compared to the current buffer pool size to give an approximate rate of buffer pool churn (i.e. a measure of buffer pool data written to disk per second). For example, consider a 64bit SQL 2008 EE instance with 128GB RAM. Maximum memory for the instance is configured at 120GB and we can see that in the Target Server Memory performance counter. So to get MB of buffer pool writes per second we can use this equation:

BPool_Churn_in_MB_per_sec = (Target Server Memory KB/1024)/PLE

Here is a quick sample query to collect that data:

select Instancename, 
TargetServerMemoryKB/1024 'TargetServerMemoryMB',
TotalServerMemoryKB/1024 as 'TotalServerMemoryMB', 
PLE, (TargetServerMemoryKB/1024)/PLE as 'ChurnMB/sec'
from(
select @@SERVERNAME as 'Instancename',
max(case when counter_name = 'Target Server Memory (KB)' then cntr_value end) as 'TargetServerMemoryKB',
max(case when counter_name = 'Total Server Memory (KB)' then cntr_value end) as 'TotalServerMemoryKB',
max(case when counter_name = 'Page life expectancy'  then cntr_value end) as 'PLE'
from sys.dm_os_performance_counters) 
as p

(*note: PLE per NUMA node is not considered here)

So, given this example we would discover a Churn rate of ~400MB/sec which –depending, of course, on other factors – could be considered a fairly high rate of sustained disk activity due to disk writes from the buffer pool.

128GB is Real

Yes, servers with 128GB of RAM do exist. However, in my experience with the day-to-day workhorses in the SQL wild, 16GB, or the very common 8GB VM, or even 64GB of RAM is much more common than machines with 100GB+ RAM. So, in fact the 300 rule might still be considered a pretty good ballpark memory pressure indicator for servers in the 8GB-64GB range. But the point is not that a good portion of real world servers are 8-16GB VMs and so the 300 second rule is good enough, the point is that we can look deeper and with a churn rate metric we have a meaningful number on which to base configuration decisions. For one potential use scenario for this metric read on.

In the Wild

Most would agree that, in general, stacking SQL instance on the same server is not ideal. But due to poor design, consolidation efforts, security policies or a number of other reasons this design is out there and a number of SQL instances may be sharing the same server resources, including RAM.

To understand how the instances are sharing the server RAM, we can run the query above as a multi-server query so as to compare buffer pool churn. Here, utilizing the churn metric, we are beginning to develop a method to tune each instance’s minimum and maximum memory configurations for the best performance possible given the shared resource situation. Referring to the churn metric, we want to maximize available memory for each instance without compromising – too much – the memory available for the other instance(s).

To flesh out the query we can use instance name as a join condition and include a few other metrics of interest. Also, let’s drop the Churn MB units to KB (so we are more likely to get a non-zero number for qualitative comparison). Lastly, note we are using ‘Total Server Memory’ to calculate the Churn since – especially considering potential memory pressure from the other instances – this is the truer (though not perfect) representation of the size of the buffer pool.

;with cte as (
select @@SERVERNAME as 'Instancename',
max(case when name = 'min server memory (MB)' then value_in_use end) as 'MinServerMemoryMB',
max(case when name = 'max server memory (MB)' then value_in_use end) as 'MaxServerMemoryMB'
from sys.configurations)
select p.Instancename, 
os.PhysicalCPUCount, os.PhysicalMemoryMB,
c.MinServerMemoryMB,c.MaxServerMemoryMB,
p.TargetServerMemoryKB/1024 as 'TargetServerMemoryMB',
p.TotalServerMemoryKB/1024 as 'TotalServerMemoryMB',
p.PLE , (p.TotalServerMemoryKB)/p.PLE as 'ChurnKB/sec'
from(
select @@SERVERNAME as 'Instancename',
max(case when counter_name = 'Target Server Memory (KB)' then cntr_value end) as 'TargetServerMemoryKB',
max(case when counter_name = 'Total Server Memory (KB)' then cntr_value end) as 'TotalServerMemoryKB',
max(case when counter_name = 'Page life expectancy'  then cntr_value end) as 'PLE'
from sys.dm_os_performance_counters) 
as p
join cte c on p.instancename = c.instancename
join
(SELECT @@SERVERNAME as 'Instancename', cpu_count AS 'LogicalCPUCount', hyperthread_ratio AS 'HyperthreadRatio',
cpu_count/hyperthread_ratio AS 'PhysicalCPUCount', 
physical_memory_in_bytes/1048576 AS 'PhysicalMemoryMB' 
FROM sys.dm_os_sys_info ) as os
on c.instancename=os.instancename

Here are a couple example results from servers hosting 3 instances:

Server 1:

Instance CPUs MemMB MinMemMB MaxMemMB TargetSMemMB TotalMemMB PLE ChurnKB/sec
Instance1 8 16383 4600 4600 4600 4600 398486 11
Instance2 8 16383 4600 4600 4600 4600 25544 184
Instance3 8 16383 4600 4600 4600 4600 2672 1762

Server 2:

Instance CPUs MemMB MinMemMB MaxMemMB TargetMemMB TotalMemMB PLE ChurnKB/sec
InstanceA 4 16383 1024 2048 2048 2048 86002 24
InstanceB 4 16383 2048 8192 8192 8192 379 22133
InstanceC 4 16383 1024 1024 1024 1024 71931 14

Just quickly scanning these results one might consider reconfiguring the even split of memory on Server1’s instances. InstanceA max memory can perhaps be reduced and the more heavily churned Instance3 max memory increased. On the other hand, on Server2 one might argue that even though Instances A and C have a max memory setting that is tuned quite low and InstanceB is getting the greater slice, InstanceB is still reporting a high relative churn (note the PLE is approaching the 300 mark). Perhaps the server needs more RAM or InstanceB should have a dedicated server.

Decisions, Decisions

Granted these are only snapshot values. These metrics can change often and may vary widely over a day, a week, or a month. A better, more interesting evaluation would include results from these counters, as well as a few others, over some range of time and as the configurations are adjusted – but I’ll save that for another post.

To conclude, the 300 rule turns out to be of some use, but just looking at PLE on these instances we might not be able to come to a reasonable judgment on how to distribute the RAM resource. It is only possible by looking in-depth beyond the 300 second rule to consider the churn rate and, more generally, all the pertinent info; not just PLE, but including (most importantly) buffer pool size and (most specifically) the current buffer pool size in use, as well as minmax server memory and available RAM and CPU count. We need this whole picture presented before us so we can compare, contrast and decide.

12c Upgrade Bug with SQL Tuning Advisor

This blog post outlines steps to take on Oracle upgrade 11.2 to 12.1 if you’re having performance problems. Oracle offers a patch and work around to BUG 20540751.

Megan Elphingstone | March 22, 2017

Oracle EPM Cloud Vs. On-Premises: What’s the Difference?

EPM applications help measure the business performance. This post will help you choose the best EPM solutions for your organization’s needs and objectives.

Bobby Ellis | April 10, 2018

Scripting Out the Logins, Server Role Assignments, and Server Permissions

Imagine there are over one hundred logins in the source server and you need to migrate them all over to the destination server. Wouldn’t it be awesome if we could automate the process by generating the scripts for the required tasks?

JP Chen | October 1, 2015

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.

CONTACT US

Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.

EXPLORE JOBS