Question

Solved1.51K views24th March 2021cloud-native (containerized) infrastructure

9

Abdul Asserti 10 20th October 2020 0 Comments

Is there a road-map from Skyline to make Data-miner application will run / support cloud-native (containerized) infrastructure ?

Sammy Dewilde [SLC] [DevOps Advocate] Selected answer as best 24th March 2021

1 Answer

score 14 · Answer 1 · 2020-10-21T10:42:41+00:00

Hi Abdul,

Converting the core DataMiner application into a set of containerized microservice is currently not on the immediate roadmap. Assuming that’s specifically what you were referring to. Maybe let me sketch the outline of our roadmap as it pertains to architectural evolutions of the DataMiner System. Because we mapped out our roadmap to gradually introduce architectural evolutions (without disruption for our considerable installed base – which is also an important consideration) in line with common industry and technology evolutions, focusing on where we can deliver most added value for our user base in the first place.

Firstly, from its inception DataMiner was designed as a distributed computing solution. It consisted of so-called DMAs or DataMiner Agents, commonly referred to as DataMiner nodes. A single DataMiner System consists of multiple DataMiner nodes. All nodes are fully functional DataMiner Systems (i.e. if you deploy one DataMiner node, you essentially have a fully functional DataMiner System). If you needed more capacity in terms of data collection & processing or storage, you essentially add DataMiner nodes to the DataMiner System. This can be done at run-time, i.e. you can spin up a new DataMiner node, and add it into an existing operational DataMiner System, and you can start adding new managed products or data sources to that node. From a user perspective, a DataMiner System, irrespective of the number of nodes, always behaves like a single entity. You can point your client to any node to connect to the DataMiner System, and it will provide you a consolidated view of the all elements managed by that DataMiner System (i.e. assuming you have the security rights to see everything, because obviously the security defines what you can access in terms of managed elements – and this is independent of course of the DataMiner nodes used to manage those elements).

Before moving on to our roadmap, I just wanted to add the following details. From the initial version of DataMiner, the messaging between the DataMiner nodes was done via a proprietary messaging bus that we developed (simply because back then distributed systems were not really that common at all, and there wasn’t really much around in terms of industry standard fabrics to tie nodes together). Secondly, from a data storage perspective, DataMiner used an SQL database (MySQL initially). Each DataMiner node had its own instance of MySQL, where it was storing all the data, such as the historical fault and performance records for the elements it was responsible to manage. Again, from a user perspective, a DataMiner System always behaved like a single entity (as such if you would query in the user interface to get the last 24 hours of alarms, those would automatically load into your Alarm Console, while in the background of course they were stored across the different MySQL instances in each DataMiner node – essentially the storage was a collection of individual MySQL instances isolated from one another (one per DMA node), and we handled everything there was to handle to make this appear a single consolidated solution)

How did we decide to move forward in terms of architecture evolutions.

In terms of data storage we have evolved from SQL to Cassandra and Elastic. The latter was necessary to accommodate the needs of our customers to store ever more data and to support more complex use of that data. At the same time this means that we have evolved from a single DataMiner cluster where each node basically performed the data collection, processing and storage (because each node was by default the DataMiner software + its standalone MySQL instance), to a solution where you can completely separate the data collection & processing from the data storage at a system level. i.e. a DataMiner System today consists of a cluster of DataMiner nodes (responsible for data collection & processing) and separately you can have a single Cassandra and Elastic cluster for storage of the data. Meaning that you can now scale your data collection & processing on one hand and the data storage on the other hand entirely independently of one another. If you have a need to store more data, you spin up extra Cassandra and/or Elastic nodes, if you have enough storage but you need to scale up your data collection & processing capacity, then you can spin up more DataMiner nodes to the DataMiner cluster.
We always want to apply industry standard technology wherever that is possible and makes sense (e.g. we have always used industry standard open data storage solutions). As distributed computing has become more mainstream, we also started looking at replacing the fabric between the DataMiner nodes (as mentioned, this was proprietary and developed by Skyline initially). Today the fabric between the nodes is being gradually replaced by a standard NATS messaging system. This is a major on-going change, which will go pretty much unnoticed for most of our users as this is not very visible, but it is an important step to take to be able to further evolve on. It’s kind of more of a foundational underlying evolution which is serving as an enabler towards other upcoming roadmap evolutions.
Another important evolution is our public DataMiner Cloud Platform (DCP), for which we are about to launch the MVP, and with a lot of plans moving ahead. Firstly I want to note that this is a not a multi-tenant DataMiner System in a public cloud, and we are not asking / pushing / demanding our customers to move their DataMiner Systems to the public cloud. We have a massive amount of DataMiner Systems deployed around the world that play a very vital role for both the monitoring and orchestration of very important technology infrastructures in the media & telecom space. We refer to those systems as private DataMiner Systems (just to avoid confusion and clearly separate them from the public DataMiner Cloud Platform). Those private DataMiner Systems stay, and will continue to be deployed. The appetite and desire from our user base to deploy a DataMiner System as an instance within a public multi-tenant cloud ecosystem (and to pay it as a service) was extremely low when we made inquiries about (mainly because of the nature of the solution, as well as its continuous high volume data ingest & processing characteristics). Now, pay attention, this doesn’t mean that a private DataMiner System is always deployed on-premises, it is perfectly possible to deploy a DataMiner System off-premises, or even in a hybrid fashion. We have many users these day that deploy some or all of their DataMiner nodes in the cloud (and the same goes for the data storage nodes of course, i.e. the Cassandra and Elastic nodes). But the DataMiner Cloud Platform I’m referring to is completely complementary to those private DataMiner Systems, and independent of how those are deployed and scaled. It is hard to explain everything about DCP (if you are on the Inside Track, you can access a recent presentation about DCP in the Dojo Video Library to get the full picture), but one of it’s functions is to be an enabler for the private DataMiner Systems for a whole range of new capabilities. For that, the private DataMiner System has to be what we call cloud-connected (i.e. you can take your private DataMiner System, and have a single secured connection with DCP, and this single connection will then enable a growing set of new features and capabilities for your private DataMiner System). To give you one example, which is launching in preview as part of the DCP MVP: Dashboard Sharing. Dashboard sharing is part of a broader Live Data Sharing service. But what it does essentially is that it allows you to take a dashboard in your private DataMiner System and to share that live with anybody out there in the world, in a secured and scalable fashion, just with a few clicks. In other words, you dispose of an out of the box, on-demand data sharing capability (similar to sharing a drop box file so to speak) in your private DataMiner System, as if it was running as a public cloud solution. And this is just small example to illustrate the kind of add-on services we want focus on with DCP for cloud-connected private DataMiner Systems. And there is much more to come, also related to deployment of DataMiner Systems for example, to further facilitate that and down the road create more agility to scale up and down more on-demand.

The above is pretty much what we have been focusing on recently. And apologies for the lengthy outline, but I just wanted to sketch the full picture here. And in a nutshell, the above will result a DataMiner System consisting of A) deployment of so-called DataMiner nodes (taking care of data collection & processing), and deployment of Cassandra and Elastic nodes (for data storage / indexing), which both can be done independently and be scaled according to the needs of the individual user. And as mentioned different parts of that can be done on-premises or off-premises. And B) you can make that private DataMiner System cloud connected, and via our public DataMiner Cloud Platform your private DataMiner System will be further enhanced with on-demand services (such as sharing a dashboard with live data on-demand in a secured in scalable fashion, out of the box).

Now obviously this is not an end-point but just a start, or should I say that evolution is continuous and never ending. What’s on the roadmap for the next evolutions, and what are we focusing on now:

Element Swarming: we have already started and we are eager to finish now the element swarming capability. Let me frame that. Today, an element (a resource/product managed by DataMiner) is managed by a specific DataMiner node. If that node disappears for whatever reason, there are limited options to recover from that to ensure the continuity of the management of the elements that were taken care of by that node. There’s the option to ensure that you have an up to date image of that node and to spin it up again, or you could deploy a new blank node and restore a back of the failed node, or you could deploy a hot-standby node (i.e. we have an option to deploy a 1+1 node, where both nodes will synch automatically and where one node will take over automatically if the other ‘disappears’). Now, with any of those scenarios there is a certain time frame during which the elements that were managed by the failed node are no longer managed. These days, that time frame is no longer acceptable, it is too long. With element swarming, which we already demonstrated in lab environment, elements that were managed by a node that failed, will simply very quickly recover by swarming out to any of the other still active nodes (of course provided that the other nodes have spare capacity to do that). But essentially this means that any node in a DataMiner cluster can fail, and that all other remaining nodes will instantly take over the data collection and processing tasks of the failed node (again, this can be done as long as you have spare capacity in the system – and of course if the overall system capacity is reducing due to failed nodes, while the swarming will instantly deal with that to ensure the continuity of the monitoring and orchestration, it will be important at some point to automatically spin up again new DataMiner nodes to compensate for the failed nodes and to restore the overall desired nominal system capacity). Element Swarming is now a top priority to finish.
Next up, after that, we will start focusing probably on the DataMiner nodes themselves (i.e. the nodes that are responsible for data collection & processing, where we have split of the data storage and where we put in the NATs fabric). And this is probably the actual point that you were asking about if I’m not mistaken. Because today, this node can easily be deployed in a highly automated fashion, either on-premises or off-premises, typically as a VMs. Further to the kind of deployment: Cassandra and Elastic can be deployed as containers (and FYI for sake of completeness, we are in the process of validating also Cassandra and Elastic as an AWS service, so that this is also an alternative for customers as compared to having their own Cassandra and Elastic deployment), and we could also do the same for the existing DataMiner node (also as a container, but we have not validated that as there was not much demand for that yet from the user base).
But what about further breaking the DataMiner node down into smaller chunks / microservices running in separate containers (which is the essence of your questions I presume)? Is there any further considerable value or benefit in that, and how does that weigh against the engineering efforts? Today we are considering splitting off first a data collection capability from this data collection & processing node. In other words to have a containerized cloud-native data collection node, which basically can be deployed as a remote front-end data collector for the DataMiner System (focusing at data ingest and first line data processing tasks such as data grooming, aggregating, obfuscation, etc.). Some users have expressed some interest in such a capability down the road, in order to be able to deploy such a light-weight data collector (rather than the current dataminer node) into various parts of the operation (e.g. spread across a data center, into a public cloud ecosystem, etc.). Note that what I mention here is still in a conceptual phase, we haven’t made any decisions nor did we start the development, so we’ll be happy to organize a meeting with you to exchange further thoughts on this (or anybody else reading this and who wants to engage with us on this subject).

Again apologies for the lengthy response, but I just wanted to paint the full picture, to show you where we came from, where we are today and where we are going in the next iteration on our roadmap. I would be very happy to schedule a call or meeting to further elaborate on this, and also hear your thoughts (and for that matter anybody else who is interested to engage with us on this subject – the better we understand the needs of our users, the more we can focus on evolving DataMiner in area’s where we add most value).

Data-miner support container infrastructure

1 Answer