Skip to content
DataMiner DoJo

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Search in posts
Search in pages
Log in
Menu
  • Updates & Insights
  • Questions
  • Learning
    • E-learning Courses
    • Empower Replay: Limited Edition
    • Tutorials
    • Open Classroom Training
    • Certification
      • DataMiner Fundamentals
      • DataMiner Configurator
      • DataMiner Automation
      • Scripts & Connectors Developer: HTTP Basics
      • Scripts & Connectors Developer: SNMP Basics
      • Visual Overview – Level 1
      • Verify a certificate
    • Video Library
    • Books We Like
    • >> Go to DataMiner Docs
  • Expert Center
    • Solutions & Use Cases
      • Solutions
      • Use Case Library
    • Markets & Industries
      • Media production
      • Government & defense
      • Content distribution
      • Service providers
      • Partners
      • OSS/BSS
    • Agile
      • Agile Webspace
      • Everything Agile
        • The Agile Manifesto
        • Best Practices
        • Retro Recipes
      • Methodologies
        • The Scrum Framework
        • Kanban
        • Extreme Programming
      • Roles
        • The Product Owner
        • The Agile Coach
        • The Quality & UX Coach (QX)
    • DataMiner DevOps Professional Program
      • About the DevOps Program
      • DataMiner DevOps Support
  • Downloads
  • More
    • DataMiner Releases & Updates
    • Feature Suggestions
    • Climb the leaderboard!
    • Swag Shop
    • Contact
    • Global Feedback Survey
  • PARTNERS
    • All Partners
    • Technology Partners
    • Strategic Partner Program
    • Deal Registration
  • >> Go to dataminer.services

General database failure. (hr = 0x80040226)

Solved481 views28th August 2024
1
Miloš Sedláček [DevOps Advocate]676 17th July 2024 0 Comments

Hello community,

we have deployed separate 3 node cassandra cluster (and 3 node opensearch cluster) as well as one LAB DMA. We connected the LAB DMA to these clusters.

We have added some elements to generate the data (alarms, trends). Then we were testing stability of the system by bringing down database notes (both cassandra and opensearch nodes). All tests went well, even more than good as e.g. the historical alarms were shown when all 3 cassandra nodes were down (I guess DMA is using the offloaded data in this case).

But, then we shut down all servers to move them. Once booted back, we noticed one element in error

The error says there is a general database failure and therefore the element can not start. So we repeated the “test” and shut down everything once again. After that, the element in error initially was OK, but other 2 elements had that error.

After some time of manipulating (multiple times of stopping, activating, …) in cube, one of these two became OK. The other one is still in error.

Other interesting data:

Element log:

2024/07/17 17:05:22.045|SLProtocol - 22284 - TNS4200-1 - copy|19560|ElementDataPagedQuerier::Query|ERR|-1|Could not start reading the element data for 38603/7: 0x80131500: 
2024/07/17 17:05:22.047|SLProtocol - 22284 - TNS4200-1 - copy|19560|CProtocol::InitializeParameters|ERR|0|Failed to query elementdata for 38603/7: General database failure.
2024/07/17 17:05:22.047|SLProtocol - 22284 - TNS4200-1 - copy|19560|CProtocol::Init|ERR|0|InitializeParameters failed General database failure. (hr = 0x80040226)
2024/07/17 17:05:22.048|SLDataMiner.exe - TNS4200-1 - copy|15024|CElement::Start|ERR|-1|InitializeProtocol for Element TNS4200-1 - copy failed with General database failure.. (hr = 0x80040226)
2024/07/17 17:05:22.049|SLDataMiner.exe - TNS4200-1 - copy|15024|CElement::Activate|ERR|-1|Start failed. General database failure. (hr = 0x80040226)
2024/07/17 17:05:22.049|SLDataMiner.exe - TNS4200-1 - copy|15024|CElement::SetState|DBG|0|** Setting state from 1 to 10 (RealState = 4)
2024/07/17 17:05:22.049|SLDataMiner.exe - TNS4200-1 - copy|15024|CElement::SetState|DBG|0|** Setting state finished.

Database Connection log (obtaining this error whenever the erroneous element gets activated):

2024/07/17 17:44:05.872|SLDBConnection|StartPagedRead|INF|0|73|System.AggregateException: One or more errors occurred. ---> Cassandra.ReadFailureException: Server failure during read query at consistency Quorum (2 response(s) were required but only 0 replica(s) responded, 2 failed)
...
...

Logs from the cassandra node (obviously failing getting data for erroneous element with DMA ID = 7):

ERROR [ReadStage-2] 2024-07-17 15:44:05,896 NoSpamLogger.java:111 – Scanned over 100001 tombstones during query ‘SELECT v, vu FROM dmsdemo_elementdata.elementdata WHERE d = 38603 AND e = 7 AND p > 6004 AND i > ‘290602.520.5102’ LIMIT 5000 ALLOW FILTERING’ (last scanned row token was -4197853928192400396 and partion key was ((38603, 7), 17010, 131985986)); query aborted

Issuing the query command from above log manually by cqlsh brings me this:

Error from server: code=1300 [Replica(s) failed to execute read] message=”Operation failed – received 0 responses and 2 failures: READ_TOO_MANY_TOMBSTONES from /10.185.116.211:7000, READ_TOO_MANY_TOMBSTONES from /10.185.116.213:7000″ info={‘consistency’: ‘ONE’, ‘required_responses’: 1, ‘received_responses’: 0, ‘failures’: 2, ‘error_code_map’: {‘10.185.116.211’: ‘0x0001’, ‘10.185.116.213’: ‘0x0001’}}

So I assume data related to the element gets corrupted in cassandra cluster and can’t be read by DMA. I also run nodetool repair –full on one Cassandra node but without an effect.

My questions could be condensed to these:

  • why did such situation happen? The cluster is there to provide robustness and avoid loss of data.
  • why some elements reverted back to normal state when the assumption is that the data were lost?
  • isn’t there a way to force the erroneous element rewrite the essential data that are blocking it to start normally?

Regards,

Milos

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 28th August 2024

1 Answer

  • Active
  • Voted
  • Newest
  • Oldest
1
Michiel Saelen [SLC] [DevOps Enabler]5.63K Posted 18th July 2024 1 Comment

Hi Milos,

It’s great to hear how you are testing the system, thanks for sharing that. When using OpenSearch and Cassandra Cluster, alarms are stored within OpenSearch. This is likely why alarms were still accessible when stopping the Cassandra DB.

The elementdata (saved parameters) is stored within Cassandra. At startup of the element it will first try to retrieve the saved parameters from the Cassandra DB. This has failed and that is why you see the error on the element. The SLDBConnection logging also confirms that requests are failing.

The query result with CQLSH tells us that the query failed for that element is caused by too many tombstones that were encountered. This is most likely because the parameters that are saved in the connector/element are updating too frequently or because non-volatile tables have too many rows being added and removed (keys are saved by default to keep alarm root time).

This question will tell you what can cause tombstones in Cassandra in general:

What will cause Cassandra tombstones ? – DataMiner Dojo

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 28th August 2024
Miloš Sedláček [DevOps Advocate] commented 22nd July 2024

Hi Michiel,

thank you for the answer. It shown us the right way to investigate. We had indeed improper configuration of tombstones in cassandra (tombstones thresholds in cassandra.yaml).
The element in error went OK spontaneously again and now we know it happened because tombstone count decreased below error threshold. Anyway, it’s likely we’ll avoid such situation in the future as the threshold was increased as per suggestion described in SL doc.

Still, we want to understand managing tombstones and therefore will need to get more familiar with this. Anyway, thank you for help again!

Please login to be able to comment or post an answer.

My DevOps rank

DevOps Members get more insights on their profile page.

My user earnings

0 Dojo credits

Spend your credits in our swag shop.

0 Reputation points

Boost your reputation, climb the leaderboard.

Promo banner DataMiner DevOps Professiona Program
DataMiner Integration Studio (DIS)
Empower Katas
Privacy Policy • Terms & Conditions • Contact

© 2025 Skyline Communications. All rights reserved.

DOJO Q&A widget

Can't find what you need?

? Explore the Q&A DataMiner Docs