9.1 Emerging Web Architecture Trends

Web architectures have been considered in chapter 3. The entire architecture is designed around the essence of the HTTP protocol. If the protocol changes over time, this may also affect the architecture. Furthermore, the design and the internals of the components of a web architecture may be influenced by other trends as well.

The Future of the HTTP Protocol

When the web was getting increasingly popular in the the mid and late nineties, the final HTTP/1.1 specification was published under a lot of pressure. This has led to several inaccuracies and ambiguous statements and artifacts in the resulting standard [Not12]. Consequently, the IETF constituted the HTTPbis Working Group, which is responsible for maintaining the standard. The working group not just collects problems with the current standard, but also revises and clarifies the specification, especially with regard to conditional requests, range requests, caching and authentication. It is planned that these improvements eventually transition into the HTTP/2.0 specification.

Extension Protocols to HTTP/1.1

Besides misconceptions in the HTTP/1.1 specification, other problems and weaknesses of HTTP/1.1 have been manifested in the field. Common points of criticism include performance issues, high latencies, verbose message formats and weak support for extensions like custom HTTP authentication mechanisms. The rigid client-initiated request/response cycle has also been criticized, because highly interactive web applications often require server-initiated communication as well. This has not just been addressed in the WebSocket protocol [Fet11] recently, but also by specific HTTP protocol extensions such as Full-Duplex HTTP [Zhu11]. Similar HTTP protocol extensions like HTTP-MPLEX [Mat09] integrate multiplexing and header compression into the protocol.

The waka Protocol

Fielding, who has introduced the REST architectural style [Fie00], has also identified drawbacks of HTTP when using REST as an integration concept for large enterprise architectures. Drawbacks include head of line blocking of pipelined requests and legacy issues with verbose message headers. Also, Fielding denotes the absence of unsolicited responses and better messaging efficiency, especially for low-power and bandwidth-sensitive devices. Consequently, Fielding has started to work on waka, a token-based, binary protocol replacement for HTTP. It is deployable via HTTP, using the Upgrade header and introduces new methods on resources like RENDER or MONITOR. Request and transaction identifiers are used to decouple request and response messages, allowing more loosely coupled communication patterns. Waka can be used with different transport protocols and it is not limited to TCP. Besides binary communication, waka uses interleaved messages for better performance. Fielding is still working on the waka specification, and there is no draft available yet.

SPDY

Other efforts for more efficient web protocols have been expended in the industry. The most popular initiative is led by Google and works on an application-layer protocol called SPDY [Bel09]. SPDY focuses on efficient transporting of web content, mainly by reducing request latencies. In HTTP/1.1, performance (i.e. throughput) can be increased by using multiple persistent connections and pipelining. According to the SPDY protocol designers, this connection concurrency is responsible for increased latencies of complex web sites. They argue that a single connection between the client and the server is more efficient, when combined with multiplexed streams, request prioritization and HTTP header compression. Advanced features of SPDY include server-initiated pushes and built-in encryption. SPDY is designed for TCP as the underlying transport protocol. SPDY is already in use by Amazon, Twitter and Google services, and the browsers Google Chrome and Mozilla Firefox provide client-side protocol implementations. Microsoft has suggested an alternative HTTP replacement, called HTTP Speed+Mobility [For12]. It incorporates concepts of SPDY and the WebSocket protocol, but it is enriched with optional protocol extensions for mobile devices.

HTTP/2.0

At the time of writing in early 2012, the HTTPbis Working Group has been rechartered and is about to start work on an upcoming HTTP/2.0 draft [Not12]. We have seen recurring concepts in the different protocols. Fortunately, most of the ideas have been acknowledged by the HTTPbis group and their protocol designers are in close dialog. Thus, the aforementioned protocols may influence the design of HTTP/2.0 and boost the HTTP/2.0 specification.

New Approaches to Persistence

The NoSQL hype has already been shaking the database world and has led to a more versatile toolbox for persistence in the mind of developers and architects. Still, relational database systems are the most prominent and popular choices for persistence. However, there is some general criticism on the essence of current RDBMS.

Future Architectures of RDBMS

Stonebraker et al. [Sto07] point out that RDBMS are still carrying the legacy architecture of the first relational database systems such as System R. These systems have been designed mainly for the business data processing at that time, and not as a general-purpose solution for all kinds of persistence. Also, different hardware architectures prevailed at that time, and interactive command line interfaces for queries constituted the primary user interface for database access. In order to provide high performance and throughput on machines available back then, traditional concepts such as disk-based storage and indexing structures, locking-based concurrency control and log-based failure recovery have been developed and implemented. Latency has been hidden by extensive use of multithreading. Although these concepts have been complemented with other technologies over time, they still represent the core architecture for each RDBMS available. Stonebraker argues that this architecture is not appropriate anymore. It is especially not appropriate, when RDBMS are used in a "one-size-fits-all" manner for many different kinds of persistence applications. According to Stonebraker, a "complete rewrite" is necessary in order to provide high performance architectures for specialized database applications, with distinct requirements. Only a rewrite would allow to get rid of architectural legacy concepts and to realign on hardware trends such as multi-core architectures and large main memories. Furthermore, Stonebraker campaigns for a better integration of database access and manipulation as part of the programming model without exposed intermediate query languages like SQL. In their work [Sto07], they introduce a prototype for such a new database system. The system runs multiple single-threaded instances without any communication and resides in main memory. It is still a row-oriented relational database and provides full ACID semantics. It appears that not just NoSQL databases will provide more specialized solutions for different usage scenarios. If Stonebraker ends up being right, we will also see further diversification of relational database management systems and engines for online transaction processing.

In-Memory Database Systems

The idea of in-memory database systems [Gm92] and hybrid in-memory/on-disk systems becomes increasingly popular, as the performance increases of new CPUs, main memory components and harddisk technologies tend to drift apart. When low latencies are required, for example for interactive and real-time applications, in-memory database systems represent an attractive solution. When durability is required, they can be combined with asynchronous, non-volatile disk-based persistence.

Event Sourcing and CQRS

And others again generally raise to question the way we are handling and persisting mutable state in applications. An alternative paradigm, which is increasingly receiving attention, is the combination of two patterns: event sourcing [Fow05] and CQRS [Dah09].

Figure 9.1: A very simplified illustration of the flow control in an architecture that uses CQRS and event sourcing.

The underlying idea of event sourcing is to capture all state changes explicitly as domain events. Thus, the current application state becomes the result of the sequence of all events captured so far. Events are stored using a durable event log (e.g. a database). Instead of changing and updating existing state in a database, new events are emitted when application actions take place. This has several benefits. For instance, listeners can easily be added to the stream of events and react to specific events or group of events in an asynchronous manner. The main benefit of event sourcing is the ability to replay state evolution. All events are persisted, hence the application can rebuild any prior state by reapplying the events from the event log. Even alternative orderings or the effect of injected additional events can be analyzed. Event sourcing also supports snapshotting, since events natively represent incremental updates.

Traditional persistence layers provide a single model for reading, reporting, searching and transactional behavior. The CQRS pattern decouples different concepts, namely command operations and query operations, using separate models. This separation of concerns improves scalability and integrates an eventually consistent behavior into the application model.

When both patterns are combined, as shown in figure 9.1tmp, command operations emit new events, which are added to the event log. Based on the event log, the current application state is built and can be queried, entirely decoupled from commands. This encapsulated approach provides interesting scalability properties and may find its way into future web architectures.

Tackling Latencies of Systems and Architectures

Coping with latencies is one of the big issues of large-scale architectures and distributed systems. Increasingly complex multi-core CPU architectures with multiple cache levels and sophisticated optimization mechanisms have also widened the latency gap of local operations. Table 9.1 shows the vast differences of latencies of various local and remote operations. When designing and implementing low latency systems, it is inevitable to take into account these numbers--both locally and for distributed operations. The fact that hundreds of machines of a web architecture may work together for responding to a single request should not be obvious for the user just by yielding high latencies.

Operation Latency
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes w/ cheap algorithm 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from disk 20,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns

Table 9.1: Common operations and their average latencies. Note that these numbers are exemplary and platform dependent. Yet, they give a rough round-up on the impact of different operations. Source: Talk "Building Software Systems at Google and Lessons Learned" by Jeffrey Dean (Google) at Standford.

The classical model of a Von Neumann architecture running sequential executions may still provide a theoretical abstraction for programmers, but modern hardware architectures have slowly diverged from it. For tackling latency locally, it is important to understand the implications and properties of modern CPU architectures, operating systems and runtime environments such as the JVM. This notion, sometimes referred to as "mechanical sympathy" [Tho11], has renewed interest in approaches like cache-oblivious algorithms [Fri99]. These algorithms take into account the properties of the memory hierarchy of a CPU. They favor cache-friendly algorithm designs over algorithms that solely reflect computational complexity. A recent example is the Disruptor [Tho11], which is a high performance queue replacement for data exchange between concurrent threads. The Disruptor basically uses a concurrent, pre-allocated ring buffer with custom barriers for producer/consumer coordination. It does not use locks and heavily promotes CPU caching.

The latency of remote operations is more difficult to tackle. Network latencies are generally bounded by physical constraints. Of course, efficient infrastructure layouts for data centers are a prerequisite. However, Rumble et al. claim that "it's time for low latency" for the network [Rum11] in general. They demonstrate that between the years 1983 and 2011, network latencies have improved far more slowly (~ 32x) than CPU speed (> 1,000x), memory size (> 4,000x), disk capacity (> 60,000x) or network bandwidth (> 3,000x). According to their analysis, this is mainly caused by legacy technology stacks. Rumble et al. argue that round trip times of 5-10 μs are actually possible in a few years. New network interface controllers already pave the way for such latencies, but current operating systems still represent the major obstacle. Without an entire redesign, the traditional network stacks are not capable of such low latencies. Rumble et al. also note that this change requires new network protocols that are aware of such low latencies. Once available, low latency networking may partially influence distributed computing, as it bridges the gap between local and remote operations and provides better temporal transparency.