In a previous blog post we talked about what scares me most. Sharks and avalanches. While they may terrify me, they are still pretty cool… when you know enough about them. This is similar to the dangers of the Data Collector and Management Data Warehouse (MDW). But I set out to learn more about them and I share my insight with you here.
We left off on the “sharks,” or in other words, what happens after you enable and configure the Data Collector.
Once you enable and configure the Data Collector it seems there is no going back. If you change your mind and want to clean things up there is no easy way to remove the associated SQL Agent jobs and MDW database objects. You’ll note attempts to drop the SQL Agent Jobs associated with the Data Collector Sets result in a “DELETE statement conflicted with the REFERENCE constraint” error.
The best you can do is disable the jobs andor stop the Data Collection Sets. For more details see this Connect post. (Of interest: check out the comments section and note how many well-known members of the SQL community, bloggers and authors, have sounded off on this one).
If you have multiple instances sharing a server, beware the schedules of the data collection upload jobs. By default they run every 15 minutes beginning at 12 a.m. A single upload is not much concern, but if two or more instances are uploading concurrently there can be some conflict resulting in CPU andor memory use spikes on the source server, as well as conflicts in the MDW database.
Back to the MDW and reporting side of things, although in general I like the layout and information available via the Management Data Warehouse Overview, I find the delivery a bit clunky. It’s slow to refresh zooming in and out of time ranges and windows and its slow going back to previous views. Also, the time range is limited to 24 hours in the Server Reports.
Digging deeper into the pre-packaged reports, under the Disk Usage Collection Set report you’ll notice that the Disk Usage History (via the DB name links) gives an error; “MDW Disk Usage for Database Report Error – A data source has not been supplied for the data source DS_TraceEvents”. Per this Connect post, this is not yet fixed. The good news is that this is only an issue if the MDW database is centralized.
Also under the Disk Usage Collection Set report you’ll find the reports by Database and Log (linked via the blue trend lines) are inaccurate. You can read more about it here where I take a closer look at what is going wrong.
Dangerous, but cool:
Ok, so we have established there are a number of things that can go awry when configuring and using Data Collection and the MDW – it’s certainly not perfect. However, we also know the underlying idea is a good one – a quick and easy baseline and performance monitoring solution. Let’s review the pros and cons…
- No third party tools
- Out of the box solution to capture and report on baseline and performance metrics
- No need to make and maintain roll-you-own perfmon collections and aggregation scripts.
So, what to do about those avalanches and sharks?
First, consider ignoring the option to centralize the MDW. Instead – similar to a local perfmon baseline – set each Data Collector to upload to a local MDW (preferably a dedicated utilityadmin database which each instance should have IMHO).
It’s a shame to miss out on the convenience of the centralization option but keeping the MDW local mitigates both the disk and network over-use issues as well as some of the reporting troubles. A big shop might have the infrastructure to support a huge central MDW. For small and mid-size shops, you’re not losing key functionality by not centralizing the MDW, but you may be improving the current local baseline solution (if there is one).
Second, compress the tables in the MDW.
Then , explore the core and snapshot schema objects within the MDW and break free of the inflexibilities of the SSMS MDW reports. There are a number of views and procedures that can be executed outside of the SSMS MDW reports and leveraged into a custom reporting solution.
In fact, it’s feasible to reconsider centralized reporting as a custom solution if you set up some process to collect the local MDW data from each instance using the available procedures. Here is one example of a stored procedure that underlies many of the SSMS MDW reports (you can discover more using SQL Profiler.)
snapshots.rpt_generic_perfmon [ @ServerName,@EndTime,@WindowSize,@DataGroupID,@CollectionSetUid]
The @EndTime and @WindowSize parameters allows you to pull as much data as you like from the MDW database. The @DataGroup parameter identifies the type of report: ServerActivity, SqlActivity, SqlMemoryUsage, SystemCpuUsage, SystemDiskUsage, SystemMemoryUsage. (Note: for any of these use N’49268954-4FD4-4EB6-AA04-CD59D9BB5714′ for the CollectionSetUid.)
The data collected by the system collection sets can be quite valuable once you get to know it. As part of a monthly review for each instance I like to use these stored procedures to pull a month of data from the MDW database into a csv file. Then, back on my PC, I can use that file to build excel charts to add to a monthly SQL instance review.
Take control of the size and growth of the MDW database. Don’t blindly accept the default retention days on the System Data Collection Sets (14 days for Server and Query, 730 days for Disk). Instead configure each Data Collector to keep only as much as you require. Or use the @retention_days parameter of the core. sp_purge_data proc to keep all the data purged to a reasonable range. Per this KB article, the purge process has been improved, however you might still review this MSDN question and the links within – especially regarding indexing – if you run into problems with the purge process.
Likewise, review the default schedules of the data collection job, especially for stacked instances. Don’t allow two upload jobs to run at the same 15 minute interval.
Turn off the Query Statistics Data Collection Set. Cool as it is, it takes a lot of disk space to archive this data. Also, some of the reports based of this collection reference the plan cache. If the query you are examining does not still exist in cache the report may not be complete. Approach this data collector and its reports more like a profiler trace. Start the Query Statistics Data Collection Set for a period of interest, then stop it and analyze the results.
Stagger the start times for upload jobs for instances that share a server, but also –if the MDW is centralized – across the environment to minimize insert conflicts within the MDW database.
And finally, if you need to clean up a Data Collection and MDW setup, refer to this very comprehensive post from Aron Bertrand.
What have your experiences been? Let me know, I’d love to hear from you.
The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.
This blog reviews how you can generate scripts for SQL server logins, role assignments, and server permissions for a smooth migration.
Which RAID should you use with SQL Server? Learn the differences between RAID 0, RAID 1, RAID 5, and RAID 10, along with best practices.