Hi everyone,
I’m testing the Kafka Consumer protocol (Generic Kafka Consumer) across two DataMiner agents with identical configuration parameters, both meant to consume messages from an AWS MSK (Kafka) cluster and store the consumed JSON data in a local directory.
However, I’m facing a strange issue:
-
On DMA #1, the consumer connects to the Kafka brokers successfully, consumes messages, and writes the JSON files as expected.
-
On DMA #2, with the same configuration, it continuously logs:
[<span class="hljs-symbol">thrd:main</span>]: <span class="hljs-link">Cluster connection already in progress: coordinator query </span>
[<span class="hljs-symbol">thrd:main</span>]: <span class="hljs-link">Not selecting any broker for cluster connection: still suppressed: no cluster connection </span>
Error: sasl<span class="hljs-emphasis">_ssl://b-1.kafkaqa...:9096/bootstrap: Connect to ipv4#10.xxx.xxx.xxx:9096 failed: Unknown error (after 21043ms in state CONNECT)
</span>It never reaches a connected state or produces any JSON output file.
We checked the basics:
-
The IP resolves fine — we can
pingthe broker IPs directly from the DMA’s command prompt. -
Both DMAs are using SASL/SCRAM over SSL on port 9096.
-
The same credentials and topic are used, and both brokers are accessible via AWS MSK from other tools (like Lambda).
-
Both elements point to the same
dataminer-protocol-feature-teststopic, same directory structure for file output, and identical protocol parameters.
What I’d like to understand
-
Is there any additional TCP or SSL requirement for Kafka connections beyond ICMP reachability (ping)?
-
Could there be a local Windows or SSL store dependency that’s missing or outdated on the non-working DMA?
-
Are there specific librdkafka configuration options or certificates that must be present per hosting agent for SASL_SSL connections to succeed?
-
Is there a recommended diagnostic log level or Kafka debug flag within the protocol to trace SSL/TLS handshake issues?
Environment summary
-
Kafka cluster: AWS MSK (SASL/SCRAM-SHA-512, SSL, port 9096)
-
Protocol: Generic Kafka Consumer / Custom Skyline Kafka Consumer
-
DM version: (add your version, e.g. 10.4.0.0)
-
Hosting: Two DMAs (same version), different Windows hosts
-
Behavior: Works perfectly on one DMA; fails to connect on another
Any insight on what might cause this “connect failed / no cluster connection” behavior when ping and DNS are fine would be greatly appreciated!
Maybe worth checking if there is no firewall issue? You could evaluate if a network connection can be made by running from the DMA the following command in powershell 'tnc 10.x.x.x -port 9096'