Keeping Cassandra Alive

加入 Oracle技术学习QQ群:171092051




服务热线 : 400-690-3643 备用电话: 13764045638 邮箱


Troubleshooting Cassandra under pressure

This is the second blog post in the series. This is a bit more technical than the first one. I will explain some things that can be made to keep a cluster/server running when you are having problems in that cluster.

There were a lot of changes in Cassandra over the last 4 years (from 0.8 to 2.1), so I will refrain from discussing troubleshooting problems that affect some specific versions. Also, this is the kind of troubleshooting you need when you can’t “add a node”.

Why can’t I just add a node? Well, if you aren’t on vnodes, and you didn’t pre-calculate the token ranges, adding a node is a big effort. Other constrains may also apply, like budget or deliver time for hardware (if you are on bare metal). Plus, rack capacity, power constrains, etc…

Now you may say:

“Ok, we can’t add a node! What should we do?! We have a storm coming!”

So, I did navigate over that storm and it’s not an easy task, but it’s doable! First thing, you have to know what you have, that is critical! You also need to know where you can take more damage.

Let’s assume you have the following situation, and what I recommend for it:

  • Heavy Write Cluster, Low Read

Now let’s define “storm”: A storm is not when when Cassandra fails, it’s about an unanticipated load increase or a disaster. What happens is that you have more load than your planned capacity (Either because of failure of nodes or because of a sudden traffic increase). This will increase your resource usage to a point where your machines will start to die.

Let’s understand what can cause a Cassandra process to die, and a probably the machine (If you OOM and you didn’t configure swap… I warned you!) for the scenario described above.

  1. More data to the commitlog = more I/O pressure (Discard if you have commitlog on a different HDD)
  2. Data is written to memtables = Memory is used
  3. Memtables reach thresholds faster, get flushed to disk = I/O pressure
  4. Compaction starts faster and frequently = I/O pressure, CPU pressure
  5. Too many I/O compaction can’t compact fast enough and the memtables aren’t flushing fast enough = Memory not being released.
  6. Too much memory usage, JVM triggers GC more frequently = CPU pressure
  7. JVM can’t release memory = OOM
  8. OOM = PUM! Node dies (if you are “lucky” kernel will kill Cassandra)

And I didn’t go trough the hints that would be stored as nodes became unresponsive and send out once they get back online.

So now we know where our pain points are. Let’s understand them and see what we can do about it:

  • Commitlog – Let’s just assume you have this on separate HDD, and don’t do anything about it (after all it’s your safeguard).
  • Memtables – We can control how often they are flushed. It is a possible tweak point. Although it requires a Cassandra restart for the changes to produce an effect.
  • Compaction – This we can control via nodetool, inclusive we can disable it in the later versions.
  • JVM GC – We can change settings, but difficult to tweak and a restart is needed.
  • Swap – We can play a bit here if we do have a swap partition.
  • Dirty_ratio – How often the data is actually written to the HDD. This can put your data at risk, but also can help.
  • Replication Factor – this can be changed on the fly, will help by having less pressure on the nodes.

So, what do to? Where to start? It depends on a case by case scenario. I would probably make my Read performance suffer to keep the writes getting in. To allow that, the easiest way should be making the reads CL = ONE. That sometimes does look like the fast and easy option. But if you’re writes are not using Quorum or/and you have read_repair… You will spread more writes (And RF>1). I would pick compaction as my first target, you can always try to get it up to pace (re-enable, increase compaction throughput). Another option would be increase dirty_ratio and risk losing data (trusting the commitlogs + RF>1 helps not losing data) but this will give your HDD more room until the cluster recovers.

But every case is a case. I will talk about my own case, problems and resolutions this Wednesday at the Datastax Day in London! Fell free to join me!



扫码关注 微信公众号:



TEL/電話+86 13764045638
QQ 47079569