Grocery Shopping – two ways to go about it:
1. Walk in and begin randomly wandering the aisles…
2. Grab milk, salsa, batteries, 4 oranges, a dozen eggs, duct tape, loaf of bread, 8-pack paper towels, bundling each item into arms as best as possible
3. Realize this would be much easier with a shopping cart
1. Walk in and review pre made shopping list…
2. Decide – hand basket or cart?
3. Methodically proceed – aisle by aisle – to collect all listed items and place in basket or cart…
4. Proceed to checkout
You get the point. It’s usually better to have a plan. It’s the same for SQL Server. When a query is submitted, SQL checks its shopping list (query), figures the most efficient path (execution plan), goes to get the items (table by table), and proceeds to check-out (return data).
The SQL Cardinality Estimator (CE) is the critical piece of SQL query processing that calculates the metrics which determine a query plan’s memory allocation, its shape, and its operators. Just like you might look over your grocery list on the way in the store to decide if you’ll need a hand-basket or a buggy, the CE gets an approximate row count information from a table’s statistics and – along with the query optimizer – uses that info to choose, for instance, a merge join vs an inner join or a seek vs a scan and build a plan.
It works great, at least as long as we have an accurate interpretation of our grocery list. Things go awry if, say, “potatoes” are on the list and we thought we’d pick out 2 or 3… but come to find out they are only sold by the 10lb bag. The hand basket would have been fine for a few spuds but now it seems we should have brought the cart.
That skew between estimated and actual potatoes happens in SQL Server too and it can result in non-optimal plans which means slower query performance. To avoid this estimated vs actual discrepancy (for some, not all, situations) the Microsoft team has improved the Cardinality Estimator logic in SQL 2014. What kinds of changes did they make? Let’s talk some more potatoes. Here are a couple of examples:
Let’s say we are in the produce section to pick up a couple potatoes and we get a phone call from home. Surprise! Friends are coming over for potato dinner -again- tonight! Our estimate that we needed a couple Russets was incorrect; we actually will need to get many more potatoes!
A classic example of this is a table which includes a date column. Inserts to the table ascend by this date column. If a query runs against this table for a recent date – so recent that the auto-update-stats process has not yet kicked off for the newly inserted data – the statistics around that date don’t yet exist. In this case, SQL will just take a guess and estimate one row … yes, you heard that right: SQL assumes just one row will be returned. Of course this will not be even close to correct in many scenarios; it’s likely that there will be significant estimate vs actual disagreement. The new CE uses an average frequency calculation based on the existing statistics (cardinality * density). I.e. on the fly, SQL figures out the average row count for any given date value in the table and uses that number for its estimation. Again, this also may not be accurate in every situation, but it’s a good bet that this average will be closer to the actual value than the single row assumption.
Potatoes for dinner again? We’re going to need some butter. There’s a correlation here – we need about a quarter stick per potato.
Ok, enough about potatoes. Let’s move right into the classic example for column correlation. Consider a data table of people that includes the locational data: city, state, and country. Now consider a query executing against the table for people with a city, state, and country equal to New York, New York, USA. You and I can see how these columns relate and that this query will return many rows. But the legacy CE is not set up to understand these correlated columns. The ‘where’ clause here is quite restrictive – three specific values for three distinct predicates – so the CE might well estimate a very low row count … not too different, potentially, than if the query were run for Amsterdam, California, and India. The new CE is set up a bit looser to allow for the possibility that predicate columns may be correlated. It does this via a calculation called ‘exponential backoff’ which, to simplify, weights estimations based (inversely) on the selectivity of the column. Less selective columns (i.e. more duplicate values) yield larger estimates. And larger estimates, in general, are closer to the actual row counts.
The ‘classic’ examples, and much of the background for this post, come from Joe Sack’s white paper “Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator”. In the paper, Joe describes a number of other changes in cardinality estimation the new CE implements including changes to improve join estimates and distinct value estimates.
Turn it on. Or turn it off.
As pointed out in the examples above, the CE changes aim to improve cardinality estimations and query plans. It’s a best effort at the best fit for most situations, but there is no guarantee that every query will run faster as of SQL 2014. In fact, the estimations for some queries via the new CE may be less accurate and may result in reduced query performance. To that end, SQL allows a DBA to toggle the new CE on and off according to the situation. Here are a few points on tuning the new CE on or off:
Database Compatibility. The new CE is enabled when database compatibility is set to 120 (SQL 2014). Any new database created on a SQL 2014 instance will have a database compatibility level to match the instance versions – i.e. 120 for SQL 2014. However, database restores, database attaches, and even in-place upgrades from earlier versions leave the database at its original, older compatibility level. This is definitely something to watch out for when upgradingmigrating up to SQL 2014+. Ideally the database compatibility should match the instance version so as to take advantage of the latest functions, features, and improvements. But if the new CE is not desired for the given workload, must the database remain at a lower compatibility? No. SQL Server gives the DBA the ability to enabledisable (given a DB compatibility of at least 110) the new CE via trace flags.
Trace Flag 9481 – revert to old CE. Forces queries running within a database context with a compatibility 120+ to use the old CE functionality.
Trace Flag 2312 – enable new CE. Forces queries running within a database context with a compatibility 110 to use the new CE functionality.
Enable trace flag functionality instance wide via a start-up parameter.
Enable the functionality session wide (or globally with the -1 switch) via DBCC TRACEON ()
E.g. DBCC TRACEON (9481)
Enable the functionality per query via QUERYTRACEON
E.g. SELECT column1 FROM table1 WHERE type = ‘potato’ OPTION (QUERYTRACEON 2312)
Old or New?
This granular level of control is very handy, but it begs the question; how can one determine when and where to implement or disable the new CE? Let’s start by saying that for long-term supportability, simplification and standardization, the goal is to get a migrating database up to database compatibility 120 with the new CE enabled. Once there we can hope to see an over-all performance improvement and the next step is to identify the hand-full (hopefully it’s just a few) queries that might benefit from the 9481 flag. Of course, if performance trends the other direction after the upgrade, it may make sense to enable 9481 system-wide and identify specific queries to use 2312 (note: lower level configurations override higher levels). In either case, this all needs to be done in a test environment to avoid surprise performance problems in production and because capturing the estimated vs actual row counts, which can point to CE issues, is intensive and intrusive.
I’ll follow up in another post to look at deducing cardinality estimate vs actual problems and options to troubleshoot and resolve, including more on the CE options discussed in this post.
How did these suggestions work for you? Let us know your thoughts.
The “ORA-12154: TNS Oracle error message is very common for database administrators. Learn how to diagnose & resolve this common issue here today.
Most DBAs have struggled with the pros and cons of choosing one data type over another. This blog post discusses different situations.
Our database experts explain how to recover and restore a table from an Oracle 12c RMAN Backup with this step-by-step blog. Read more.