Compaction in Cassandra

Author: Satish Rakhonde | 9 min read | January 9, 2024

Compaction in Cassandra refers to the operation of merging multiple SSTables into a single new one. It mainly deals with the following:

Merge keys
Combine columns
Discard tombstones

Compaction’s Purpose

Compaction is done for two purposes:

It bounds the number of SSTables to consult on reads. Cassandra allows multiple versions of a row to exist in different SSTables. At read, different versions are read from different SSTables and merged into one. Compaction will reduce number of SSTables to consult, and therefore improve read performance.
To reclaim space taken by obsolete data in SSTable.

Compaction Strategies

Cassandra supports the following compaction strategies, which you can configure using CQL:

SizeTieredCompactionStrategy (STCS): This is the default compaction strategy. It triggers a minor compaction when a certain number of similarly sized SSTables are on disk, as specified by the table subproperty min_threshold. A minor compaction doesn’t involve all the tables in a keyspace. This strategy is ideal for write-heavy workloads.
LeveledCompactionStrategy (LCS): The leveled compaction strategy creates SSTables of a fixed size (160 MB by default) grouped into levels. Each level contains non-overlapping SSTables. This strategy is recommended for read-heavy workloads. SSDs are recommended due to extensive I/O activities.
DateTieredCompactionStrategy (DTCS): This strategy is particularly useful for time series data.
TimeWindowCompactionStrategy (TWCS): This strategy is an alternative for time series data. TWCS compacts SSTables using time windows or buckets.

Tuning Compaction

As per DSE 5.0, there are three parameters that need to be configured to tune compaction.

snapshot_before_compaction: Enables or disables taking a snapshot before each compaction. A snapshot is useful for backing up data during data format changes. However, older snapshots are not automatically deleted by Cassandra, the default is ‘False’.
concurrent_compactors : The number of concurrent compaction processes allowed to run simultaneously on a node. If you are using SSDs, then you increase this value to the number of cores. Default is the smaller of the number of disks or number of cores, with a minimum of 2 and a maximum of 8 per CPU core.Note: Increasing concurrent compactors utilizes more disk space for compaction as they occur in parallel, particularly for STCS (SizeTieredCompactionStrategy). Prior to increasing this configuration, ensure sufficient disk space availability.
compaction_throughput_mb_per_sec: Throttles compaction to the specified Mb/second across the instance. Faster data insertion in Cassandra requires faster system compaction to maintain a low SSTable count. The recommended value is 16 to 32 times the write throughput rate (in Mb/second). Setting the value to 0 disables compaction throttling, default setting is 16.

When to Use Which?

Leveled compaction essentially treats more disk IO for a guarantee of how many SSTables a row may be spread. Below are the cases where Leveled Compaction can be a good option.

Low latency read is required
High read/write ratio
Rows are frequently updated. If size-tiered compaction is used, a row will spread across multiple SSTables.

Below are the cases where size-tiered compaction can be a good option.

Write heavy workloads.
Rows are write once. If the rows are written once and then never updated, they will be contained in a single SSTable naturally. There’s no point in using the more I/O intensive leveled compaction.

Command to Modify Compaction Strategy

ALTER TABLE users WITH compaction = {'class' : 'LeveledCompactionStrategy' }

ALTER TABLE users WITH compaction = {'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 6 }

Minor/Major Compaction

Minor compactions are triggered automatically whenever flush from memtable (memory) to sstables (disk). You can tune the settings for minor compaction according to your performance requirements.

By default, minor compactions are kicked off when 4 or more tables are flushed to disk and have similar sizes, and when there are between 4 and 32 SSTables on disk in total.

When you manually do compaction using the nodetool command, it forces a major compaction. Major compactions may behave differently depending on which compaction strategy is used for the affected tables:

Size-tiered compaction (STCS) splits repaired and unrepaired data into separate pools for separate compactions. A major compaction generates two SSTables, one for each pool of data.
Leveled compaction (LCS) performs size-tiered compaction on unrepaired data. After repair completes, Casandra moves data from the set of unrepaired SSTables to L0.
Date-tiered (DTCS) splits repaired and unrepaired data into separate pools for separate compactions. A major compaction generates two SSTables, one for each pool of data.

Performance Impact

These are some issues listed below that can happen due to mis-tuned compaction parameters.

Node Unreachable

This occurs when there are a large number of pending compaction tasks.

Reason: It is routine to have compaction running while reads and writes are served simultaneously. It is also normal that compaction can fall behind if you continue to throw writes at it as fast as possible. In this scenario, compaction activity holds the machine resources and average load can go up, which may cause several nodes to become unreachable.

Possible Solution: If a workload that requires compaction always remains up to date, then you will require a larger cluster to spread the load around more machines. You can check pending compaction tasks using this command on cassandra node.

“nodetool compactionstats”

Slow Read

Reads are getting slower while writes are still fast.

Reason: Again, this issue can occur when rows are fragmented over time, requiring more IO and CPU time for reads. Too many SSTables can cause slow reads. This can happen due to pending compactions. Even if flushes are not excessively frequent, compactions might not be able to keep up. If a high number of pending compactions exist, compactions are not keeping up.

Possible Solution: For its resolution, you need to make sure the compaction_throughput_mb_per_sec is set appropriately. The default value of 16 MB/secs for compaction_throughput_mb_per_sec is chosen for spinning disks. SSDs can use a much higher setting such as 128 MB/sec or more.

If you have set a high compaction throughput but I/O utilization is low and compactions are still not keeping up, the compactions may be CPU-bound.
You need to check the per-core CPU utilization of CompactionExecutor threads.If the threads are utilizing 100% of a single core, the compaction may be CPU bound. Increasing concurrent_compactors will allow multiple concurrent compactions of different sets of SSTables, but compaction of each set of SSTables is inherently single-threaded. If we are using LeveledCompactionStrategy (LCS), we need to either switch to SizeTieredCompactionStrategy (STCS) or add more nodes to spread compaction load.Note: Increasing concurrent compactors beyond the number of physical CPU cores can be counterproductive. Using all available CPU for compaction means no CPUs remain to handle reads and writes. If you need to continue to service requests while catching up on compactions, be sure to leave at least 1 or 2 physical CPUs free for reads and/or writes.
Switch to SizeTieredCompactionStrategy. If you are using LeveledCompactionStrategy (LCS) and the above steps haven’t worked, consider switching to SizeTieredCompactionStrategy (STCS). LCS uses more resources to compact than STCS. Often nodes that are falling behind while compacting with LCS can easily keep up using STCS.
If reads are slow due to excessive SSTables, one option is to run compaction using nodetool (Major Compaction). This process consolidates all existing SSTables into a single one, temporarily causing a spike in disk space usage and disk I/O due to the coexistence of old and new SSTables. However, it’s important to note that major compaction can result in significant disk I/O.
Note: You should not explicitly run nodetool compact. It is not fatal, but it does create very large SSTables. If you are using SizeTieredCompactionStrategy, Cassandra will wait for n number (n can be configured) of same size SSTables before it triggers the next minor compaction, but sometimes it may help to get rid of deleted/overwritten data.

Want to chat more about Cassandra with our expert team? Contact us today.

Contact an Expert »

Blog Author

Satish Rakhonde

Technical Manager

With experience spanning across various OS flavors and Cloud DB platforms, Satish brings a rich blend of technical expertise to the table. At Datavail, he has been instrumental in delivering top-notch NoSQL DBA services, catering to a diverse clientele. His vast repertoire includes Installation, Configuration, Backup & Recovery, Performance Tuning, Capacity Planning, Security, Upgrades, Maintenance, Monitoring, Replica Set Configuration, Sharding, and designing large-scale databases. His strategic approach and meticulous planning have consistently resulted in optimized performance and enhanced security for clients' databases.