Select Page

Did You Know NTFS Volumes Could Break CHECKDB?

Author: Karthikeyan Amarnath | 6 min read | April 16, 2024

We had a DB maintenance job for integrity check (DBCC CHECKDB) that was running fine for ages. Suddenly, we noticed that job was failing with a space limitation error. While we had enough space on the drive, the job failed and returned the following file system limitation error message:

 

“The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x00009741572000 in file ‘F:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\xxxx.mdf_MSSQL_DBCC10’. Additional messages in the SQL Server error log and operating system error log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.”

Initially, we checked If there was any space crunch during that window and did not see any. We then executed the database maintenance in smaller chunks. Instead of CHECKDB, we did CHECK TABLE in smaller batches, which was successful.

The following week, we tried DBCC CHECKDB with PHYSICAL_ONLY and that was also successful. We then tried the CHECKDB and ended up with the same issue. When we looked closer, we found that the issue occurs once the snapshot created while performing the DBCC CHECKDB.

We discovered that the issue was mainly due to the NTFS format. When we changed the format to ReFS, it resolved the problem.

What Was the Problem with the NTFS Format?

A heavily fragmented file in a NTFS file system may not grow beyond a certain size due to its structural limitation. The sparse nature of DBCC snapshot or a snapshot database can drive attribute exhaustion.

As we know, on any DBCC CHECKDB execution, the database snapshot will get created. If it is a VLDB, then a large number of ATTRIBUTE_LIST_ENTRY instances are needed to maintain a heavily fragmented file in NTFS. If the space is next to a cluster that’s already tracked by the file system, then the attributes are compressed into a single entry.

However, if the space is fragmented, it has to be tracked with multiple attributes. Thus, heavy file fragmentation can lead to attribute exhaustion causing 665 errors followed by CHECKDB failure.

What are the Feasible Options to Fix This Issue?

Following are the list of options we could use to eliminate this issue. All the below options require database downtime.

  1. Add a new drive with ReFS format (not NTFS) and move the data files to that new volume. ReFS do not have the same ATTRIBUTE_LIST_ENTRY limits.
  2. Defragment the volume where the data files reside. Make sure the defragmentation utility is transactional and SQL is down during the process. Defragmentation works differently on solid-state drives (SSD) media and typically doesn’t address the problem. Copying the file(s) and allowing the SSD firmware to repack the physical storage is often a better solution.
  3. File copy – performing a copy of the file may allow better space acquisition because the bytes might be tightly packed together in the process. Copying the file (or moving it to a different volume) may reduce attribute usage and may prevent the OS error 665. Copy one or more of the database files to another drive. Then, you may leave the file on the new volume or copy it back to the original volume.

How to Avoid This Kind of Scenario?

This issue mainly occurs on larger databases. If you have databases that are more than 2 TB, then it is advisable to follow any of the below best practices:

  • Have multiple data files instead of one.
  • If possible, allocate drives with ReFS format for data files instead of NTFS.
  • If you have limitations on having the ReFS format, then format the drive with NTFS using the /L option to obtain large FRS. This makes the ATTRIBUTE_LIST_ENTRY larger and helps to avoid attribute exhaustion.
  • Limit the frequency of database growth by setting adequate auto growth size which reduces the fragmentation.
  • Always perform DB maintenance in the off hours.

What is the Long-term Solution?

Using ReFS format is the long-term solution for this issue. Though we have different options mentioned above to fix this issue, the permanent solution would be having ReFS volumes for the drive holding larger databases.

Need help overcoming your database administrator challenges? Get in touch with our experts today.

References:

https://learn.microsoft.com/en-us/troubleshoot/sql/database-engine/database-file-operations/1450-and-665-errors-running-dbcc-checkdb

https://learn.microsoft.com/en-us/archive/blogs/psssql/operating-system-error-665-file-system-limitation-not-just-for-dbcc-anymore

How to Solve the Oracle Error ORA-12154: TNS:could not resolve the connect identifier specified

The “ORA-12154: TNS Oracle error message is very common for database administrators. Learn how to diagnose & resolve this common issue here today.

Vijay Muthu | February 4, 2021

Data Types: The Importance of Choosing the Correct Data Type

Most DBAs have struggled with the pros and cons of choosing one data type over another. This blog post discusses different situations.

Craig Mullins | October 11, 2017

Oracle RMAN Backup and Recovery with Restore Points

Oracle restore points are useful for benchmark testing. Find out how you can use Oracle’s Recovery Manager (RMAN) tool to create and use restore points.

Cindy Putnam | May 3, 2019

Subscribe to Our Blog

Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.

CONTACT US

Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.

EXPLORE JOBS