For the past few months, I’ve been teaching at companies who are heavy users of Apache Storm. They’re also undertaking massive projects to move off of Storm. During that time, I’d say that something new was coming that might convince them to consider an alternative.
Now, I’m free to talk about that alternative. Twitter has open sourced Heron.
Heron is what replaced Storm at Twitter. It’s been running at Twitter for over 2 years in production. We aren’t talking toy processing jobs; these are the mission critical jobs that power Twitter’s real-time processing.
Twitter has a substantial investment in code write using the Storm API. To keep from having to rewrite everything, Heron continues to use Storm’s API. Behind the scenes all of Storm code is running on Heron instead of Storm.
In my informal talks with Twitter engineers who were part of the Storm to Heron migration, there weren’t rewrites; they simply recompiled their code with the Heron JARs. The majority of user-facing changes were operational and infrastructure.
I first learned about Heron at Strata NYC 2015. It intrigued me because there was a real need for Storm improvements. I attended Karthik’s talk to learn about their results and how they achieved them.
The most notable for me was a 2-5x performance boost without changing a line of code. That lets companies with large Storm codebases get performance improvements they need. This speed improvement came from rethinking and rewriting the full framework. Much of this included writing performance critical pieces in C++ and user code in Java.
There are other operational advantages too. One complaint of Storm was that all topologies ran in a single JVM process. This made it difficult to debug and profile while topology was acting up or having problems. Heron runs each topology in its own process. This process isolation makes it much easier to debug. You also have better visibility into which topology is misbehaving or which part of the topology is misbehaving.
The process isolation improved Heron’s scalability too. It gave Heron the ability to execute a large number of instances for each topology on a single node. Heron also has the ability to launch and track large number of topologies. Heron doesn’t come with its own scheduler. Instead, it uses other proven schedulers such as Apache Mesos.
Storm provides an at least once and at most once processing model. That means a piece of data could be processed many times or a single time. In practice, at least once processing led to some Storm processing being 2-3 times the actual amount when dealing with failures. While Heron keeps the at least once and at most once processing models, it’s been shown to be about 3% off. This number is from production workloads running at scale. In comparison to Storm’s at most once processing, Heron provides a much higher throughput.
We have another great option for real-time processing in Heron. It joins the list of real-time technologies proven to function at scale. If you have an investment in Storm, I highly suggest you take a look at Heron.