Hi all,
From our documentation we can see that there is a length limit of 100bytes for the primary key (https://docs.dataminer.services/develop/codingguidelines/Protocol/Primary_keys1.html). However, the documentation only refers to MySQL DBs. As we also support Cassandra and Elastic DBs, I was wondering whether this limitation is still valid or not for such systems.
Let's forget about performance/efficiency, as for now I'm most interested in issues that can be caused by the misuse of the primary key length.
Looking into the question as follows, it seems that the whole key is save, but we still have this limitation and verification in our software. https://community.dataminer.services/question/what-is-the-maximum-length-of-the-primary-key-in-unicode
For instance, in a DMA system using Cassandra, do we expect any type of issues when using primary keys of 200bytes? E.g., increase of partition sizes due to rows marked as deleted (as the key was not correctly found)...
Thanks,
Hi Flavio
There are no enforced limitations in the software that would limit the size of the primary key, as you indicate the supported size could be different based on the database-layer.
What we have seen is that when you have large partitions in Cassandra (e.g a lot of trend data) that the replication of the partitions in the Cassandra cluster starts to fail. (there is an advised limit of 100MB per partition in Cassandra).
Keeping this in mind, and knowing that we take on average 100B per trend-record we see that per parameter we can trend about 1 million points (100B x 1 million records = 100MB) per trend window. (real-time, and the different avg windows). If your primary would be bigger, the number of samples per parameter will decrease in order to stay below the advised Cassandra threshold. (e.g. if your PK would be 200B, you can only store half of the trending)
This is for trending, for non-volatile data (saved parameters in your driver (and primary keys are saved by default)) there is that same (Cassandra) limit of 100MB per partition. The bigger the PK the bigger the partition of your elementdata will become. A partition in elementdata are all the non-volatile parameters of that specific element.
As you stated already the smaller the PK the better for performance reasons, but that is not always possible.
Hope this gives you some insights.
Hi Jan, it was exactly this kind of information that I was looking for. Thanks! 🙂
It seems the size of the keys can be rather large, so if performance is not a problem: (https://tech.ebayinc.com/engineering/cassandra-data-modeling-best-practices-part-2/).