- Twitter has open sourced Heron, which replaced Storm at Twitter and has been running in production for over 2 years.
- Heron continues to use Storm's API, allowing companies with large Storm codebases to get performance improvements without having to rewrite everything.
- Heron provides a 2-5x performance boost without changing a line of code, achieved by rethinking and rewriting the full framework, including writing performance critical pieces in C++ and user code in Java.
- Heron runs each topology in its own process, making it easier to debug and providing better visibility into which topology or part of the topology is misbehaving.
- Heron's process isolation improves scalability, allowing it to execute a large number of instances for each topology on a single node and launch and track a large number of topologies.
For the past few months, I’ve been teaching at companies who are heavy users of Apache Storm. They’re also undertaking massive projects to move off of Storm. During that time, I’d say that something new was coming that might convince them to consider an alternative.
Now, I’m free to talk about that alternative. Twitter has open sourced Heron.
About Heron
Heron is what replaced Storm at Twitter. It’s been running at Twitter for over 2 years in production. We aren’t talking toy processing jobs; these are the mission critical jobs that power Twitter’s real-time processing.
Twitter has a substantial investment in code write using the Storm API. To keep from having to rewrite everything, Heron continues to use Storm’s API. Behind the scenes all of Storm code is running on Heron instead of Storm.
In my informal talks with Twitter engineers who were part of the Storm to Heron migration, there weren’t rewrites; they simply recompiled their code with the Heron JARs. The majority of user-facing changes were operational and infrastructure.
Heron Results
I first learned about Heron at Strata NYC 2015. It intrigued me because there was a real need for Storm improvements. I attended Karthik’s talk to learn about their results and how they achieved them.
The most notable for me was a 2-5x performance boost without changing a line of code. That lets companies with large Storm codebases get performance improvements they need. This speed improvement came from rethinking and rewriting the full framework. Much of this included writing performance critical pieces in C++ and user code in Java.
There are other operational advantages too. One complaint of Storm was that all topologies ran in a single JVM process. This made it difficult to debug and profile while topology was acting up or having problems. Heron runs each topology in its own process. This process isolation makes it much easier to debug. You also have better visibility into which topology is misbehaving or which part of the topology is misbehaving.
The process isolation improved Heron’s scalability too. It gave Heron the ability to execute a large number of instances for each topology on a single node. Heron also has the ability to launch and track large number of topologies. Heron doesn’t come with its own scheduler. Instead, it uses other proven schedulers such as Apache Mesos.
Storm provides an at least once and at most once processing model. That means a piece of data could be processed many times or a single time. In practice, at least once processing led to some Storm processing being 2-3 times the actual amount when dealing with failures. While Heron keeps the at least once and at most once processing models, it’s been shown to be about 3% off. This number is from production workloads running at scale. In comparison to Storm’s at most once processing, Heron provides a much higher throughput.
The Ecosystem
We have another great option for real-time processing in Heron. It joins the list of real-time technologies proven to function at scale. If you have an investment in Storm, I highly suggest you take a look at Heron.