Hi,
I'm implementing a logger table in a protocol that will be used to store a configuration object linked to a specific booking. A logger table is used in this case because there are a huge amount of bookings on the system and this configuration object can be quite big.
How I should design this logger table? I have read through the explanation in the DataMiner Development Library, but still have a couple of doubts:
Clustering
"In case the logger table persists in Cassandra, also provide a Database.CQLOptions.Clustering tag. From DataMiner version 9.0.0 onwards, if the Param.Database.CQLOptions.Clustering tag is used, the primary key (i.e. index) set in the ArrayOptions tag will be replaced by the primary key defined in the Clustering tag."
In my case, this configuration object will always be linked to the primary key (id of the booking), does this mean I don't have to configure a Clustering tag? The logger table in my case will persist in Cassandra so if it would be required, I'm not sure exactly what to configure.
Partitioning
"Partitioning in Cassandra is supported from DataMiner version 9.0.0 onwards. If ColumnDefinition is set to "DATETIME" and the Partition tag is set, Cassandra will use a TTL with the specified time."
The configuration objects can be updated multiple times until the booking has finished. After that, we should still have the last version of this object available to allow duplicating passed bookings. What would the best partitioning configuration be for this case?
Thanks!
Hello,
How I would consider interpreting these 2 options:
Clustering: How do I request my data? (partitioning keys should always be given to request data and clustering keys are optional). In your case, you will always request using a booking ID, so just having that set as a primary key should make it the partitioning key and you don't have to bother with clustering.
Partitioning: If i understood correctly, you don't want documents to be automatically deleted. So I believe that if you don't set a partition tag, we will not perform any TTL actions on this so the data will be in there untill you call DeleteRow (I believe that's the name of the call that can be used).
I hope this helps,
Kind regards,
Hi Laurens, interesting and useful info to keep in mind. Thanks for sharing! In our case, we are talking about (ten)thousands and not millions so I believe this should be fine. Also note that as we have 1 entry per id, we will also manually delete these based on an expiration configuration in our manager element.
If the partitioning key is set to the booking id then won’t this cause a high cardinality resulting in a slow lookup in the future when there are a lot of records?
According to the Datastax guidelines, suppose that there’s a table with millions of users:
-Avoid very low cardinality: when the partition key is set to gender (male/female) then you have wide partitions for male & female, which is bad
-Avoid very high cardinality: when the partition key is set to user e-mail then when looking up the data it will by chance find it on the first node, but with bad luck, or when not existing, it will need to go through all nodes.
It’s kind of like looking up an address (suppose that every street in a country would be unique): if the partitioning key is on the country then it can find that partitioning key easily, but it needs to go through all the streets in that partition (low cardinality), if the partitioning key is on the street then it needs to search through all the streets (partitions) in the country to be able to find it (high cardinality). If you add the city as partition key then you can have a faster lookup by first getting the partition that contains the city before narrowing it down to the street lookup.