Question

Solved3.02K views14th December 2021external cassandra node Linux

2

Alasdair Smith 13th October 2021 0 Comments

We have the following error on a separate Linux Cassandra standalone node:

ERROR [CompactionExecutor:51440] 2021-10-13 09:29:44,536 CassandraDaemon.java:217 – Exception in thread Thread[CompactionExecutor:51440,1,main]
java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed
at org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:220) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.io.sstable.metadata.StatsMetadata.getEstimatedDroppableTombstoneRatio(StatsMetadata.java:115) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.io.sstable.format.SSTableReader.getEstimatedDroppableTombstoneRatio(SSTableReader.java:1829) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:387) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundSSTables(SizeTieredCompactionStrategy.java:99) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundTask(SizeTieredCompactionStrategy.java:183) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:119) ~[apache-cassandra-3.7.jar:3.7]
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:258) ~[apache-cassandra-3.7.jar:3.7]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_275]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_275]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_275]

Has anyone come across this before and have an idea of the cause and severity?
It’s possibly related to, https://issues.apache.org/jira/browse/CASSANDRA-15164

Thanks for looking.

Alasdair Smith Selected answer as best 14th December 2021

2 Answers

1

Alexander Gorbunov [SLC] [DevOps Advocate]1.20K Posted 14th October 2021 2 Comments

Hi Alasdair,

I believe that CASSANDRA-15164 you referred to is indeed the cause of the problem. Quote from that page:

if we ever have more than about 1.9 billion cells in a partition, the EstimatedHistogram that tracks this will overflow. Then, when compaction attempts to get the mean number of cells per partition EstimatedHistogram throws an IllegalStateException that aborts the attempt at compaction.

In your case, the table that causes the exception is ‘infotrace’ which has just one Partitioning Key for all Information Events, resulting in unusually high cell/partition ratios.

There shouldn’t be any operational impact, since the exception affects only minor compaction. The major compaction, which runs as a scheduled task, should work regardless of histogram calculations mentioned above.

Alasdair Smith Selected answer as best 14th December 2021

Alasdair Smith commented 14th October 2021

Thank you, Alexander, what’s the best way to confirm that the full compaction has been completed successfully?

Alexander Gorbunov [SLC] [DevOps Advocate] commented 15th October 2021

There’s no easy way to verify results of a scheduled compaction, as far as I know. One thing you can do is check used disk space trending and confirm that reasonable amount of disk space has been freed after compaction.
There may be other tricks, but it would be better to discuss these in a separate Dojo post.

Thank you, Alexander, what’s the best way to confirm that the full compaction has been completed successfully?
There’s no easy way to verify results of a scheduled compaction, as far as I know. One thing you can do is check used disk space trending and confirm that reasonable amount of disk space has been freed after compaction.
There may be other tricks, but it would be better to discuss these in a separate Dojo post.

score 2 · Answer 1 · 2021-10-14T14:30:08+00:00

Hi Alasdair,

I would suggest having a look at the table histograms of all your tables (e.g. nodetool tablehistograms SLDMADB data) to see if you can identify which table is causing this. Can you create a ticket on collaboration for Skyline to follow up on this?

Cassandra: Unable to compute ceiling for max when histogram overflowed

2 Answers