Hi Community,
I just set up a 3 server setup for Cassandra.
Cassandra + Cassandra-Reaper was running fine on this first node.
Then I tried to start Cassandra and Cassandra Reaper on the second server.
The Cassandra-Reaper start fails, but I don\'t see any errors in the log file (even with debug level).
After I stopped Cassandra-Reaper on the first node, I can\'t start it there either.
The only thing I noticed in the log: Cassandra-Reaper connects not only to the local server, but also to the other server which is not specified in the Cassandra-Reaper config.
Is this expected?
nodetool status:
Datacenter: xyz
==================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.2.163 264.7 KiB 16 51.2% bb7e5e1b-92bf-4711-b20b-4a3f96327fe7 rack1
UN 192.168.2.162 305.52 KiB 16 48.8% ca386982-cca8-4c09-ad35-c63c475700e9 rack1
Parts of the Log:
DEBUG [main] c.d.d.c.Cluster - Starting new cluster with contact points [/192.168.2.163:9042]
...
DEBUG [main] c.d.d.c.H.STATES - [Control connection] established to /192.168.2.163:9042
INFO [main] c.d.d.c.p.DCAwareRoundRobinPolicy - Using data-center name \'xyz\' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
INFO [main] c.d.d.c.Cluster - New Cassandra host /192.168.2.163:9042 added
INFO [main] c.d.d.c.Cluster - New Cassandra host /192.168.2.162:9042 addedDEBUG [main] c.d.d.c.H.STATES - [/192.168.2.163:9042] preparing to open 1 new connections, total = 2
DEBUG [main] c.d.d.c.H.STATES - [/192.168.2.162:9042] preparing to open 1 new connections, total = 1
DEBUG [MCC R1-nio-worker-1] c.d.d.c.Connection - Connection[/192.168.2.163:9042-2, inFlight=0, closed=false] Connection established, initializing transport
DEBUG [MCC R1-nio-worker-2] c.d.d.c.Connection - Connection[/192.168.2.162:9042-1, inFlight=0, closed=false] Connection established, initializing transport...
Log ends with:
INFO [main] i.c.m.j.JmxManagementConnectionFactory - Initializing JMX seed list for all clusters...
INFO [main] i.c.m.j.JmxManagementConnectionFactory - Initialized JMX seed list for all clusters.
DEBUG [main] o.e.j.u.c.ContainerLifeCycle - ServletHandler@37df14d1{STOPPED} added {crossOriginRequests==org.eclipse.jetty.servlets.CrossOriginFilter@2a4f8009{inst=false,async=true,src=EMBEDDED:null},AUTO}
DEBUG [main] o.e.j.u.c.ContainerLifeCycle - ServletHandler@37df14d1{STOPPED} added {[/*]/[]/[ERROR, FORWARD, REQUEST, ASYNC, INCLUDE]=>crossOriginRequests,POJO}
INFO [main] i.c.ReaperApplication - creating and registering health checks
INFO [main] i.c.ReaperApplication - creating resources and registering endpoints
Service status:
Active: failed (Result: exit-code)
Duration: 2.805s
Docs: http://cassandra-reaper.io/
Process: 19805 ExecStart=/usr/local/bin/cassandra-reaper (code=exited, status=1/FAILURE)
Main PID: 19805 (code=exited, status=1/FAILURE)
CPU: 8.285s
cassandra-reaper.yaml (partially):
segmentCountPerNode: 64
repairParallelism: DATACENTER_AWARE
repairIntensity: 0.9
scheduleDaysBetween: 7
repairRunThreadCount: 15
hangingRepairTimeoutMins: 30
storageType: cassandra
enableCrossOrigin: true
incrementalRepair: false
blacklistTwcsTables: true
enableDynamicSeedList: true
repairManagerSchedulingIntervalSeconds: 10
activateQueryLogger: false
jmxConnectionTimeoutInSeconds: 5
useAddressTranslator: false
maxParallelRepairs: 2# If jmx access is restricted to localhost, then configure to SIDECAR.
datacenterAvailability: SIDECAR
cassandra:
clusterName: \"xyz\"
contactPoints: [\"192.168.2.163\"]
keyspace: reaper_db
loadBalancingPolicy:
type: tokenAware
shuffleReplicas: true
subPolicy:
type: dcAwareRoundRobin
localDC:
usedHostsPerRemoteDC: 0
allowRemoteDCsForLocalConsistencyLevel: false
authProvider:
type: plainText
username: some
password: thing
autoScheduling:
enabled: true
initialDelayPeriod: PT15S
periodBetweenPolls: PT10M
timeBeforeFirstSchedule: PT5M
scheduleSpreadPeriod: PT6H
Any idea how to solve or further troubleshoot?
Thanks,
Felix
Thanks Wale!
I added the second available IP to contactPoints and inserted the cluster name in localDC.
When doing sudo journalctl -u cassandra-reaper.service
I can see:
Mar 20 13:42:16 xyz-cassandra-02 systemd[1]: Started cassandra-reaper.service – Reaper for Apache Cassandra.
Mar 20 13:42:16 xyz-cassandra-02 cassandra-reaper[25157]: Using reaper in target
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: No repair unit exists for 87f6a6c0-d158-11ee-b1>
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2052)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache.get(LocalCache.java:3943)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3967)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4952)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4958)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.storage.repairunit.CassandraRepairUnitDao.getRepairUnit(CassandraRepairUnitDao.java:121)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.service.RepairScheduleService.registerScheduleMetrics(RepairScheduleService.java:150)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.service.RepairScheduleService.lambda$registerRepairScheduleMetrics$0(RepairScheduleService.java:144)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.service.RepairScheduleService.registerRepairScheduleMetrics(RepairScheduleService.java:144)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.service.RepairScheduleService.(RepairScheduleService.java:52)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.service.RepairScheduleService.create(RepairScheduleService.java:57)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.service.ClusterRepairScheduler.(ClusterRepairScheduler.java:55)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.resources.ClusterResource.(ClusterResource.java:89)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.resources.ClusterResource.create(ClusterResource.java:108)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:212)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:87)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:59)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.dropwizard.cli.Cli.run(Cli.java:78)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.dropwizard.Application.run(Application.java:94)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.ReaperApplication.main(ReaperApplication.java:99)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: Caused by: java.lang.IllegalArgumentException: No repair unit exists for 87f6a6c0-d158-11ee-b1ce-71e1216ef784
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.storage.repairunit.CassandraRepairUnitDao.getRepairUnitImpl(CassandraRepairUnitDao.java:116)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.storage.repairunit.CassandraRepairUnitDao.access$000(CassandraRepairUnitDao.java:40)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.storage.repairunit.CassandraRepairUnitDao$1.load(CassandraRepairUnitDao.java:51)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at io.cassandrareaper.storage.repairunit.CassandraRepairUnitDao$1.load(CassandraRepairUnitDao.java:49)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2273)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2156)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2046)
Mar 20 13:42:18 xyz-cassandra-02 cassandra-reaper[25157]: … 21 more
Mar 20 13:42:18 xyz-cassandra-02 systemd[1]: cassandra-reaper.service: Main process exited, code=exited, status=1/FAILURE
Mar 20 13:42:18 xyz-cassandra-02 systemd[1]: cassandra-reaper.service: Failed with result ‘exit-code’.
Mar 20 13:42:18 xyz-cassandra-02 systemd[1]: cassandra-reaper.service: Consumed 7.430s CPU time.
What’s next?
An unusual error indeed so lets check more of the underlying setup.
1) Stop reaper on all nodes
2) Drop the reaper_db key space on all nodes and create again with a replication factor of 2 or 3 if you have 3 nodes.
3) Ensure same reaper yml file on all nodes.
4) Start reaper.
What linux ditro, cassandra and reaper version are you using ?
Hi Wale,
sorry for the delay.
We are using latest Debian, Cassandra 4.1.4, Reaper 3.5.0
Indeed, droping reaper_db and recreate it solved the issue.
Thanks for you support!
Nice weekend,
Felix
A couple of things:
1) Any useful info in the system logs /var/log/messages when you try to start reaper on the first node ?
2) Reaper yml file should list all the node IPs in your cluster. ‘contactPoints:’
3) Reaper yml should indicate the datacenter in ‘localDC:’.