Did You Know NTFS Volumes Could Break CHECKDB?

Author: Karthikeyan Amarnath | | April 16, 2024

We had a DB maintenance job for integrity check (DBCC CHECKDB) that was running fine for ages. Suddenly, we noticed that job was failing with a space limitation error. While we had enough space on the drive, the job failed and returned the following file system limitation error message:

“The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x00009741572000 in file ‘F:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\xxxx.mdf_MSSQL_DBCC10’. Additional messages in the SQL Server error log and operating system error log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.”

Initially, we checked If there was any space crunch during that window and did not see any. We then executed the database maintenance in smaller chunks. Instead of CHECKDB, we did CHECK TABLE in smaller batches, which was successful.

The following week, we tried DBCC CHECKDB with PHYSICAL_ONLY and that was also successful. We then tried the CHECKDB and ended up with the same issue. When we looked closer, we found that the issue occurs once the snapshot created while performing the DBCC CHECKDB.

We discovered that the issue was mainly due to the NTFS format. When we changed the format to ReFS, it resolved the problem.

What Was the Problem with the NTFS Format?

A heavily fragmented file in a NTFS file system may not grow beyond a certain size due to its structural limitation. The sparse nature of DBCC snapshot or a snapshot database can drive attribute exhaustion.

As we know, on any DBCC CHECKDB execution, the database snapshot will get created. If it is a VLDB, then a large number of ATTRIBUTE_LIST_ENTRY instances are needed to maintain a heavily fragmented file in NTFS. If the space is next to a cluster that’s already tracked by the file system, then the attributes are compressed into a single entry.

However, if the space is fragmented, it has to be tracked with multiple attributes. Thus, heavy file fragmentation can lead to attribute exhaustion causing 665 errors followed by CHECKDB failure.

What are the Feasible Options to Fix This Issue?

Following are the list of options we could use to eliminate this issue. All the below options require database downtime.

Add a new drive with ReFS format (not NTFS) and move the data files to that new volume. ReFS do not have the same ATTRIBUTE_LIST_ENTRY limits.
Defragment the volume where the data files reside. Make sure the defragmentation utility is transactional and SQL is down during the process. Defragmentation works differently on solid-state drives (SSD) media and typically doesn’t address the problem. Copying the file(s) and allowing the SSD firmware to repack the physical storage is often a better solution.
File copy – performing a copy of the file may allow better space acquisition because the bytes might be tightly packed together in the process. Copying the file (or moving it to a different volume) may reduce attribute usage and may prevent the OS error 665. Copy one or more of the database files to another drive. Then, you may leave the file on the new volume or copy it back to the original volume.

How to Avoid This Kind of Scenario?

This issue mainly occurs on larger databases. If you have databases that are more than 2 TB, then it is advisable to follow any of the below best practices:

Have multiple data files instead of one.
If possible, allocate drives with ReFS format for data files instead of NTFS.
If you have limitations on having the ReFS format, then format the drive with NTFS using the /L option to obtain large FRS. This makes the ATTRIBUTE_LIST_ENTRY larger and helps to avoid attribute exhaustion.
Limit the frequency of database growth by setting adequate auto growth size which reduces the fragmentation.
Always perform DB maintenance in the off hours.

What is the Long-term Solution?

Using ReFS format is the long-term solution for this issue. Though we have different options mentioned above to fix this issue, the permanent solution would be having ReFS volumes for the drive holding larger databases.

Need help overcoming your database administrator challenges? Get in touch with our experts today.

References:

https://learn.microsoft.com/en-us/troubleshoot/sql/database-engine/database-file-operations/1450-and-665-errors-running-dbcc-checkdb

https://learn.microsoft.com/en-us/archive/blogs/psssql/operating-system-error-665-file-system-limitation-not-just-for-dbcc-anymore

Contact an Expert »

Blog Author

Karthikeyan Amarnath

Database Administrator

Karthikeyan Amarnath is a seasoned IT infrastructure management professional with expertise in handling databases in on-premises and cloud environments (Azure & AWS). His skills include consulting, technical analysis, designing cloud infrastructure solutions, managing infrastructure service delivery, datacenter, and cloud operations. Committed to ensuring quality and service excellence, he demonstrates the ability to adapt to new technologies, policies, procedures, business processes, and SLAs. He has a proven track record in efficiently migrating databases from on-premises to cloud and between different cloud platforms. He actively engages in cloud cost optimization solutions and database transformations such as IaaS to PaaS, EC2 to RDS, and managing other MySQL/PostgreSQL databases.

Did You Know NTFS Volumes Could Break CHECKDB?

We had a DB maintenance job for integrity check (DBCC CHECKDB) that was running fine for ages. Suddenly, we noticed that job was failing with a space limitation error. While we had enough space on the drive, the job failed and returned the following file system limitation error message:

What Was the Problem with the NTFS Format?

What are the Feasible Options to Fix This Issue?

How to Avoid This Kind of Scenario?

What is the Long-term Solution?

Blog Author

Related Posts

Subscribe to Our Blog

Work with Us

Work for Us