Specialized IT Services focused on Data Management | Speak with Us 877-634-9222
Who is afraid of the Data Collector? (Part 1)
What is scary to you? For me, top of the list, it’s avalanches and sharks. And I am afraid of the Data Collector. Yup… a little-bit. And also the Management Data Warehouse.
I’m not the only one; when I attended SQL Skills IE2 last year (which is excellent by the way) the introduction to the topic was met with audible groans. In fact, we skipped right over it and on to the next module. Perhaps we all should feel little trepidatious since within this pair of features there are some strange glitches, some hidden gotchas, and a few issues that can bite you or bury you, just like a shark or an avalanche. Yes, the dangers of the Data Collector and Management Data Warehouse are real. But – like sharks and avalanches – the Data Collection feature is also pretty cool… once you know what to watch out for.
In this two-part post I want to go over a few of the gotchas, point out a few best practices and links, and list out some tips and tricks I’ve developed to help get a Data Collection and Management Data Warehouse implementation up and running smoothly and safely.
A little background:
One of the challenges of database administration is establishing and maintaining a baseline of informational and performance statistics for the SQL instances in an environment. It’s a critical function since effective performance troubleshooting and capacity planning require some historical view of a server’s stats. Barring some third party tool that collects and archives a performance history, the typical solution is a roll-your-own combination of perfmon counters, TSQL scripts, and a dedicated ‘admin’ database to collect, parse and archive the baseline data. Without a doubt this can be difficult and time consuming to build, manage and maintain. So, to solve that problem Microsoft included the Data Collection and Management Data Warehouse(MDW) features in versions SQL2008 and above.
You’ll find the Data Collection feature in the SQL Server Management Studio (SSMS) Object Explorer under Management Tools. If Data Collection has not yet been configured it will show up as disabled and you’ll need to step though the Configure Management Data Warehouse Wizard to get it up and running. (But if this is your first introduction to the Data Collector don’t do this until you’ve read more about it below, it’s not so easy to remove).
If Data Collection has already been configured you can expand the node to find the System Data Collection Sets – Disk Usage, Query Statistics and Server Activity – which are included by default as part of the instance install. These three are a pretty good set but you can also add your own custom collectors. One nifty aspect of Data Collection is that the collectors allow the capture of OS level counters and SQL instance level counters and DMV queries within the same time slice, which is no easy feat by any other method. You can read up on how it works here, but In general, each of these collectors is running a SSIS package to collect and aggregate disk, query and server info and write it to the Management Data Warehouse.
The Management Data Warehouse goes hand-in-hand with the Data Collector since it is where the collected data is archived and is the source for the MDW Reports. It is built using the same wizard that sets up Data Collection. The wizard builds the database schema (in a new or existing DB) and adds a number of stored procedures, functions, views, and etc. The wizard also configures a set of SSMS MDW reports. These are a bit like high-octane Performance Dashboard Reports and are accessed by right-clicking the MDW database. The MDW database can be set up on a central instance and house the collected baseline data from many SQL Servers in an environment, or it can be configured as a local database on the same instance as the Data Collectors. Read more about it here.
Sounds great! Pretty much everything needed in a historical-baselineperformance-monitoring solution is already prepared and pre-packaged. Click through a wizard a few times and viola – baselines, performance data, wait stats, long queries and db-size trending. The data is pre-aggregated and centralized and dashboard style reports are set up for the whole enterprise. And it is great, except for…
If you do configure a centralized MDW, keep in mind that the 3 system data collectors generate approximately 600 MB per instance, per day. Depending on the count of instances in your environment the size of the MDW database can quickly grow into the terabyte range.
To address that growth there is a purge process to keep the database size under control. During the MDW install a job is added named “mdw_purge_data_[<mdw db name>]”. This job calls a procedure (core.sp_purge_data) every night which cycles through the collected data and purges any rows older that the ‘Retain data for’ parameter back at the Data Collector. However, the purge process often conflicts with incoming data and may not always complete. If the purge does fail or run-long, the tables are even larger the next time it runs leading to a better likelihood for further failures. This snowball effect can eventually lead to the MDW database never getting fully purged and the database growing to use all the available disk.
Similarly, if the MDW is configured on a central instance, there can be a significant load on the network with so many MB per instance moving across the wire from the SQL servers to the centralized MDW instance.
Stay tuned for our next post where we’ll take a look at the “sharks” and what happens after you enable and configure the Data Collector.