The Haypenny vision is for everybody on Earth to trade goods and services using digital currencies every day, many times a day, at a very low cost per transaction, allowing the engine to offer free transactions across the internet.
To realize that vision, we invented a new paradigm for value transfer: block-split-combine, and we developed entirely new kind of transaction engine that can cost-effectively scale to millions of transactions per second and do so with absolute transaction integrity, multi-site redundancy, and consistent low latency.
The system described here achieves the lowest possible cost per transaction in the context of an ultra-high scale system, which is two network operations and two non-volatile storage operations for each transaction side, per data center location (typically three locations, called asynchronously).
To achieve this, we made key system and application design decisions, starting with the very simplified Haypenny financial transaction model consisting of only two basic operations, then the four-tier system model of client, real-time, near-real-time and offline support systems, and the construction of an entirely purpose-built software system consisting of several scratch-built components.
The resulting system has been benchmarked and cost-modeled to show it able to scale to its initial goal of one million peak transactions per second and 100 billion transactions per month for approximately $0.000004 per monthly transaction in total data center fees. This is done in the context of each transaction being written in multiple, decentralized data stores in real-time, and each transaction being memorialized forever in indelible WORM storage within approximately 30 seconds.
The first step in creating an ultra-high capacity, ultra-low cost per transaction system is to carefully define the application model in a way that reduces system complexity and facilitates key optimizations.
The core function of the Haypenny system is to enable the transfer of numerically-delineated value from one entity to another. The Haypenny system pares this functionality down to its absolute essence, defining only one core system object--called a "block"--which is its identifier, balance, and currency identifier (also known as the "realm" or "coin id").
A Block has two possible operations: split and combine. A split operation will subtract the balance of one block and create a new block, and a combine operation will take two blocks, decrement the balance one of the blocks and add the balance of the other block.
The Haypenny paradigm has no notion of identity outside of one's knowledge of the block identifer, and is thus akin to a digital form of physical cash: you obtain the value by your mere possession of the identifier.
The Haypenny system is broken up into four execution tiers, each with their own distinct response time and uptime requirements: client, real-time, near-real-time, and offline.
This segmentation allows the most difficult technical problem, the real-time transaction request response, to be completely isolated and pared down to only the requirements for responding to the request.
The system's four tiers, therefore, support each other: the near-real-time systems offload the real-time systems, and the offline systems offload the near-real-time systems. Client systems also serve to simplify the task for the real-time systems and the system's API is designed to facilitate this.
The real-time and near-real-time systems are thought of as a set of systems, and are duplicated in multiple geographically separate data centers, typically at least three locations for "N+2" redundancy, e.g. the ability to safely continue real-time operations even if an entire data center is disrupted.
Here are the details of the four types of subsystems:
The Haypenny system is completely "API driven" from the front-end standpoint in that no end-user display content is delivered from any Haypenny systems. Rather, content is served entirely through either a content delivery network or through a mobile app store. Front-end content, therefore, is a non-issue when it comes to scalability and costs, since this approach allows a virtually limitless amount of traffic for relatively very low cost.
Haypenny's real-time systems provide a REST API1 that allows Haypenny transactions to be created and collected. This API is provided using the HTTPS protocol, as it typical for services of this kind.
Haypenny's real-time systems include:
While being the most generic component of the Haypenny system, this component is critically important not only for the necessary task of dividing up requests among a pool of front-end processors, but also to provide a "pinhole" security approach to stop probing attacks on the Front-end Processors and also to aid in mitigating distributed denial-of-service (DDOS) attacks.
This is a pool of systems that unpack and interpret API calls, make calls to the Data Table Processors, and then return a properly formatted reply to the caller. Calls to the Data Table Processors are performed asynchronously and simultaneously such that the overall latency to the data store only takes as long as the single highest latency call.
The Front-end Processors are written in Java and use the Jetty HTTP server framework. The decision to go with this framework and not pure C was made very deliberately: in benchmarks we found that the Jetty framework responded to http requests within 20% of the performance of the fastest pure C framework in terms of overhead alone. However, that overhead represented about 20% of the overall API call time, the balance being taken up by the IO to the Data Table Processors. This would mean that a move to pure C and a relatively less tested HTTP server framework would only result in approximately 4% lower latency. Java, on the other hand, offers significant advantages over C in terms of code safety, maintainability and availability of already-tested and secure libraries for various operations.
Front-end processors are responsible for all "house cleaning" tasks including system metrics. These systems do not track individual requests. Access logging for system control and monitoring purposes is left to the load balancer component (non-transactional, "imperfect" tracking); the Data Table Processors handle the primary transaction logging.
The Data Table Processors are written in C and connect to the Front-end Processors using an open TCP/IP socket. They maintain a memory-based hash table of their blocks.
Data Table Processors are arranged in a sharding approach wherein each instance contains blocks and transactions of a certain modulo. These systems hold all Haypenny block identifiers in existence, and grow both "vertically" (data size) and "horizontally" (request volume) by adding shards.
Data Table Processors store a Haypenny block very efficiently: the core data structure for a Block is 36 bytes, including hashtable overhead. Hence, even a relatively modest system by today's standards can store billions of Haypenny blocks, and larger systems and additional shards will allow for trillions of active blocks, which is far above long-term projections.
The Data Table Processor also handles the task of memorializing each transaction component to non-volatile decentralized storage to ensure absolute data integrity. This subsystem consists of a rotating set of reused log files that ensure that a single operation will incur no more than a single IO to non-volatile storage. The transaction does not indicate as completed to the end-user until the data is physically written (not cached) to redundant non-volatile storage both locally and to a server in at least one physically separate geographical location.
This system is the most "conventional" of all Haypenny's components, as it is a front-end HTTP system handling APIs connected to an off-the-shelf RDBMS. Its purpose is to handle metadata services for system.
Unlike the Haypenny transaction systems, this system has a much easier set of requirements:
The overall design approach for the Haypenny real-time systems is to perform the absolute minimum amount of processing possible on the real-time systems such that they can return their response as quickly as possible. This is enabled by Haypenny's near-real-time systems that run in the background and on separate hardware systems.
Near-real-time systems take care of the other half of the high-scale ultra-low-latency transaction logging system, rotating logs when they are full on each Data Table Processor system and moving the data to permanent locations on the network, including to disaster recovery sites. This tier also creates periodic snapshots for shards that allow for near instant recovery upon a single system failure.
The other effect of a centralized near-real-time system is that it can coordinate transaction rollbacks across shards. Split and Combine operations each consist of two different data table processor calls, potentially to two different shards. As such, if one shard fails to memorialize the operation after the first call was successfully made, the operation must be rolled back on the first shard. This is normally coordinated by the front-end processor, but there is a chance that system could also fail at the exact moment it was necessary to do so (indeed, if a front-end processor fails, there is a high chance that it would fail in between the two calls to the data table processor since this is an IO wait state). Because of this, part of the near-real-time system's job is to ensure that transactions always come in pairs, and if they don't, it will roll back the operation as necessary to maintain system integrity.
Haypenny's offline systems include a data warehouse-style database that holds an (encrypted) copy of all transactions, blocks, and all other metadata of the system for use for offline processing. This system is connected to applications that perform auditing of transactions (and allow external auditors to do the same), analysis of the system's usage for system management purposes*, and other analysis related to the running of the service, such as editorial control over metadata.
The offline system realm also includes the final storage location for the transaction log, which is in a Write Once Read Many (WORM) storage mechanism that complies with FINRA 4511, and SEC 17a-4(f). All transactions are written to this medium within approximately 30 seconds of their execution on the real-time systems.
(*Currently, over 40 different application-level metrics are gathered from each Front-end Processor every 10 seconds and this data is sent to the data warehouse).
The management of a system of this kind (with potentially hundreds of server nodes) must necessarily be fully automated, and for the Haypenny system that system is called "HayMan".
HayMan is a management console and supporting servers that allows system personnel to manage every aspect of the Haypenny system, such as deploying new versions of subsystem software, starting and stopping processing pools, troubleshooting issues for each process, and viewing statistics for each relevant process. This automation allows the deployment of an entire subsystem pool with a single click, for instance.
The Haypenny system requires a secure, robust and scalable hardware and networking infrastructure. Haypenny chose Amazon Web Services (AWS) as its infrastructure provider after carefully considering the alternatives. The winning factors for the AWS decision included:
The Haypenny transaction system consists of two write operations through a total of three APIs.
The transaction write operations are Split and Combine. These two operations are conceptual mirrors of each other.
The Split operation takes an existing block and an (optional) amount as input and returns a new block.
The Combine operation takes an existing block and another block (the "combine block"), along with an (optional) unit amount:
The Haypenny API uses a REST API and the returned values are in JSON format. The calls consist of the following:
The system includes a concept called a "IBlock" which is short for "Info Block". An Info Block is a 44 character string provided automatically every time the tx.Info API call is made for a Block. That string can be used in subsequent tx.Info calls instead of the normal block string (with the "iblock" parameter instead of the block parameter), and the call will return the same block information as if it were called with the original block string.
This mechanism is useful to securely verify the balance of a block by a third party, or to securely allow a Combine operation into a block, e.g. for a client-side-only payment.
The Haypenny paradigm treats Block identifiers as the equivalent of physical cash and thus the system's security profile must be at the level of businesses dealing with large amounts of cash, which is to say that the system must maintain an extremely diligent level of threat detection and mitigation.
The Haypenny approach to creating a secure system for customers is fivefold:
Haypenny starts its approach to securing its systems at the enterprise level, and will adhere to standards appropriate to financial institutions and select only contracted systems (viz. Amazon AWS) that adhere to these standards as well.
These standards call for, among other things:
Amazon AWS, Haypenny's cloud computing provider, itself adheres to these standards and hosts a number of prominent financial institutions who have certification for these standards.
Block IDs (exposed externally as a "block string") are stored in Haypenny's system in an encrypted state using an industry standard encryption method (AES). This means Haypenny personnel never see "live" Block IDs.
Periodically (on the order of once per day), various forms of analysis are performed on Haypenny offline systems analyzing all system transactions and balances as a whole. This analysis allows the detection of many forms of DDOS attacks, and attempts to use the system fraudulently or inappropriately, as well as providing a mechanism to double-check all transactions and block balances.
After the offline balances are verified and reconciled based on the permanent record of transactions, the online systems (Data Table Processors) are verified as having the correct balances for all blocks. This mechanism employs a strategy whereby a batch of blocks in the system are halted for trading, updates to the offline systems are completed, and then each block balance is checked against the realtime system with an ordinary user-level (tx.Info) call.
Along with this checking, the entire system is temporarily (on the order of one minute) halted occasionally such that the process can verify the active block count in the realtime systems aligns with the block count in the final system of record. The combination of these two audits ensures that no spurious blocks can be stored on the realtime systems.
Using the Internet, there is always a possibility of network failure between an end-user client and Haypenny servers at any time. Because of this, a mechanism must be made available to ensure that a user recover a lost Block if a call is successfully made but not returned.
The assumptions behind this approach are:
Because of these assumptions, the mechanism to handle it resides in the offline system tier.
For Split operations, the client creates a "secret number" that is presumed to be a cryptographically secure pseudo-random number3. It is 64 bits in size.
The "secret number" is included in the Split call, and that number is logged along with the rest of the transaction data.
In the event that a user does not obtain a response from the server due to a network error (i.e. the operation completed on the Haypenny system but the response was not received), the user may make use of an API call and provide the secret number and Block string used to make the call, and subsequently receive the lost Block string in an email.
The first step in making a system perform is to define exactly what "performance" means in a given context.
The Haypenny system's primary performance metric is internal transaction latency. This metric is defined as the amount of time elapsed from when a user's request reaches Haypenny servers to when the response is initiated. (It's worth noting that internet latency is not something taken into consideration in the Haypenny design, as this is both a straight-forward problem and not entirely under Haypenny's control).
Besides the time metric itself, another implied metric is consistency of responses. From a user-experience standpoint, users will often remember only the slowest response time for a given service even if that time is unusual. As such, our latency metric will be qualified: the highest latency within a given test will be the one by which we measure our performance goals.
While low latency is a desirable user trait to the end-user, after a certain point, a reduced latency will not be noticeable to their human-perceived response. However, latency also determines the cost of the system: the longer resources are tied up responding to a request, the more it will cost.
A higher total time taken for each request on the Front-end processor means more threads will be required to service the same number of requests per second in aggregate. An average response time of 1ms means each thread can service 1000 requests a second, and thus a Front-end processor would require 20 threads to service 20,000 API calls per second, the design goal for the Front-end processor. If that is increased to 4ms, then 80 threads are required, and so on. Threads do not scale in a linear fashion because of shared resource contention, which means there is only a finite number of threads that are practical.
The Haypenny system is designed to be a primary means of small payments for every single user of the Internet, every day. The system's architecture, therefore, includes the following assumptions about the ultimate scaling requirements:
Using the above assumptions, the following top-level scaling requirements were used in defining the architecture for the Haypenny system:
A key enabler of performance is to track as much metrics as possible on the system. The Haypenny engine tracks over 40 real-time metrics that track everything from application-level activity to internal backend memory usage at a granular (object) level. All API calls to Haypenny are individually benchmarked and a recent-running-average execution time is kept. All metrics are continuously scrutinized by near-real-time systems in order to save these metrics to a database for later reporting.
The Haypenny system has been extensively measured under a contrived load of random Split and Combine operations on a large number of Blocks.
Besides application-level benchmarks, measurements were taken at the various layers to isolate specific bits of software and hardware response times. Knowing the "absolute hardware speed" of the systems involved and comparing them to high-level benchmarks isolates the exact performance of the software itself and allows performance to be extrapolated to heavier loads.
Modern cloud computing provides a roughly linear way of scaling resources (within a very specific context, with caveats). You can purchase systems with various performance ratings, and those ratings can be tested with low-level benchmarks. Thus, if a ratio between the low-level benchmarks and the application-level benchmarks is established, a cost-performance curve can be created.
The Haypenny system's primary performance limitations ultimately come from two kinds of IO: network IO and non-volatile storage IO. A Haypenny Block operation (split or combine) requires four of each kind of IO (two network IOs to the Data Table processors; two each IO writes to non-volatile storage; one each network IOs to remote storage). Based on the cloud hardware purchased and how systems are configured, network IO operations can range from 20µs to 500µs, which again can be measured separately. Non-volatile storage "hard write" operations (i.e. not buffered in any way) can range from approximately 50µs to 500µs depending on concurrent load. (The maximum estimates here depend on the phsyical distance between data centers, and the maximum numbers shown here are based on a typical distribution of data centers within 100 kilometers from each other for instance).
Haypenny benchmarks revealed the following: