PayPal has made HERA, its High Efficiency Reliable Access to data stores, open source. Hera is a data access gateway that PayPal uses to scale database access for hundreds of billions of SQL queries per day.
Hera is made up of a language-specific Hera client library that resides on each application host and allows applications to communicate with Hera efficiently, alongside a centralized Hera proxy service that is fully aware of database configurations and application requirements.
This service receives application requests and routes them to a chosen database instance. The Router, Parser, & Query Rewriting modules are all part of the Hera proxy. A set of workers that are responsible for maintaining connections to a specific database then handle communication between the proxy and this database. Hera was originally written in C++ but has recently been rewritten in GoLang.
PayPal developed Hera to scale thousands of its applications with connection multiplexing, read-write split, and sharding. The product was developed to handle the problem PayPal found when it moved to using a microservices architecture in the application tier. This had the side effect of creating many application servers and an associated increase in pooled DB connections, with large numbers of persistent connections in-bound at the DB.
To overcome the need for persistent connections, PayPal built a connection multiplexer that could multiplex many inbound database connections, most of which were idle, from the application tier to a small set of active database connections in the DB tier. This was improved on with SQL parsing so it could differentiate between transactional and non-transactional requests.
Hera also supports a read/write split to ensure that spikes in the number of reads don’t affect write performance or vice-versa. It also converts queries from applications to SQL that ensures the query is compatible with sharding. Hera uses sharding to control data redistribution. The sharding logic is kept on the server to avoid the need for individual applications to manage sharding on the client side.
Hera also has a surge queue that deals with high volumes of service requests. At times of high demand, requests can be put into the queue and processed as soon as another service request is complete. It can also be configured to remove a slow query to allow multiple more typical queries to run.
Hera’s sharding is based on Oracle RAC clusters. Several RAC clusters are used to mimic a single logical database, with logical shards distributed over the RACs and shard routing handled by Hera. Hera uses shard key column bind name and value to determine how to direct queries to the appropriate shard. The parsing component is used to detect and extract the shard key from SQL Where clauses, and the routing component routes queries to a database based on the shard_id extracted by the Parsing component
Hera is available on GitHub.