We have the following error on a separate Linux Cassandra standalone node:
ERROR [CompactionExecutor:51440] 2021-10-13 09:29:44,536 CassandraDaemon.java:217 - Exception in thread Thread[CompactionExecutor:51440,1,main] java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed
at org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:220) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.io.sstable.metadata.StatsMetadata.getEstimatedDroppableTombstoneRatio(StatsMetadata.java:115) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.io.sstable.format.SSTableReader.getEstimatedDroppableTombstoneRatio(SSTableReader.java:1829) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:387) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundSSTables(SizeTieredCompactionStrategy.java:99) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundTask(SizeTieredCompactionStrategy.java:183) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:119) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:258) ~[apache-cassandra-3.7.jar:3.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_275] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_275] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_275] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_275] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_275]
Has anyone come across this before and have an idea of the cause and severity?
It's possibly related to, https://issues.apache.org/jira/browse/CASSANDRA-15164
Thanks for looking.
Hi Alasdair,
I believe that CASSANDRA-15164 you referred to is indeed the cause of the problem. Quote from that page:
if we ever have more than about 1.9 billion cells in a partition, the EstimatedHistogram that tracks this will overflow. Then, when compaction attempts to get the mean number of cells per partition EstimatedHistogram throws an IllegalStateException that aborts the attempt at compaction.
In your case, the table that causes the exception is 'infotrace' which has just one Partitioning Key for all Information Events, resulting in unusually high cell/partition ratios.
There shouldn't be any operational impact, since the exception affects only minor compaction. The major compaction, which runs as a scheduled task, should work regardless of histogram calculations mentioned above.
There’s no easy way to verify results of a scheduled compaction, as far as I know. One thing you can do is check used disk space trending and confirm that reasonable amount of disk space has been freed after compaction.
There may be other tricks, but it would be better to discuss these in a separate Dojo post.
Hi Alasdair,
I would suggest having a look at the table histograms of all your tables (e.g. nodetool tablehistograms SLDMADB data) to see if you can identify which table is causing this. Can you create a ticket on collaboration for Skyline to follow up on this?
Thank you Michiel, I created a ticket on collaboration.
Thank you, Alexander, what’s the best way to confirm that the full compaction has been completed successfully?