Twitter Heron, Twitter’s real-time streaming and analytics platform, has been donated to the Apache Incubator program. Heron was open sourced in 2016, and is used to process the billions of events generated at Twitter every day.
Heron is a directed acyclic graph (DAG) data processing engine that is backwards compatible with Apache Storm, which the same team open sourced in 2011 and donated to Apache. Heron was designed to improve the environment both for developers and operationally over Storm, and also added native support for Apache Aurora.
Aurora is a Mesos framework developed by Twitter for long-running services and cron jobs that runs applications and services across a shared pool of machines, and is responsible for keeping them running, forever. When machines experience failure, Aurora intelligently reschedules those jobs onto healthy machines.
Heron introduced several new ideas in stream processing, starting with the idea of back-pressure to adjust the pace of execution of topologies based on slowest component. Back-pressure is the term used to describe the problem when data builds up behind slower elements of a system because the element next in the pipeline is full because it is slower, so cannot accept more data.
In the diagram below, bolt B3 (in container A) receives all of its inputs from spout S1. B3 is running more slowly than other components. In response, the SM (Stream Manager) for container A will refuse input from the SMs in containers C and D, which will lead to the socket buffers in those containers filling up, which could lead to throughput collapse.
In a situation like this, Heron’s back pressure mechanism will kick in. The SM in container A will send a message to all the other SMs. In response, the other SMs will examine the container’s physical plan and cut off inputs from spouts that feed bolt B3 (in this case spout S1).
Once the lagging bolt (B3) begins functioning normally, the SM in container A will notify the other SMs and stream routing within the topology will return to normal.
Heron also supports modularity to enable multi language support and to provide alternative implementations for modules, as well as isolation at various levels for ease of debugging and troubleshooting.
Heron’s other key benefits include native containerization for supporting cgroups and dockers; and the fact that it can support diverse workloads in a single deployment.
Since Heron was open sourced in 2016, developments from the community include new APIs for Java and Python, alongside a low level API similar to Storm API for Python. Another improvement achieved in collaboration with Microsoft is Dhalion, technology that lets Heron self tune, self heal and self stabilize when the topologies experience unexpected behaviors due to system behavior and change in data rate and volume.
Heron will now continue development as part of the Apache Incubator project.