Question

Solved1.10K views13th July 2023Cassandra

6

Jeff Douglass 860 30th August 2022 0 Comments

How does DM with Cassandra deal with trend data that does not change for long periods of time? In MySQL based systems the DM db cleanup task issues queries on every trend table searching all older outdated data to be deleted looking for entries with the MAX id. It then uses the response from these queries to NOT delete these items to ensure that there is always at least 1 old data point for all entries to Plot in trend graphs. This ensures that trend graphs can plot data points that have not changed since the cut off time to keep that data. In large tables with lots of parameters being trended this can result in Delete Statements that contain an excessive number of table entries IDs to NOT delete. In all of our larger systems this is a serious problem because the Deletes statements contain a list of 50K to 70K ids in the NOT IN() portion of the delete statement. Yes, 50 to 70 thousand ids in a single SQL Delete statement. I understand that moving to Cassandra should resolve part of the problem since DM no longer needs to handle deleting older items from the database since it will be handled natively by the Cassandra using the TTL field, however how does DM/Cassandra handle not deleting older items that have not changed. I am concerned that Cassandra based systems may still have issues with datasets that cause the above mentioned MySQL cleaning behavior? What size datasets have been used in testing Cassandra, anything close to what would result in the above mentioned MySQL Delete behavior?

Thanks in advance.

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 13th July 2023

1 Answer

You are viewing 1 out of 1 answers, click here to view all answers.

score 1 · Answer 1 · 2022-08-31T18:40:54+00:00

Hi Jeff,

Since DM 10.0, a feature was added to handle that use case in Cassandra-based systems.

To prevent trend graphs to be empty after the only data point gets deleted (based on their TTL), DataMiner now has a mechanism in place to refresh the TTL before the record gets deleted. By doing so, we can continue to show a flat line even when the value of the parameter hasn't change in a long time.

Note, this mechanism of checking and refreshing the TTL will only trigger if DataMiner is running.

Regards,

Thanks for the info Miguel. As we discussed yesterday, my main concern and question to the developers is if the condition were there are large numbers of unique parameters in a trend table, say 50K to 70K, has been taken into account and tested for in this new cleanup mechanism for Cassandra based systems. I don’t need to know exactly how it works, just that the new mechanism was designed with this use case in mind and has been tested with similar large datasets to ensure we do not run into the same issues we have with MySQL based systems.

Thanks again.

Cassandra and trend data that does not change

1 Answer