Book Summary: High Performance Browser Networking by Ilya Grigorik

Rating: 8.5/10.

Book covering all aspects of networking and is useful for both front-end and back-end developers; it’s written at a level that doesn’t assume much prior networking knowledge, providing a higher-level overview of various protocols like TCP, HTTP, and WebSocket. It offers practical advice on what these protocol details imply for applications and how to design your apps with an understanding of underlying protocols to optimize performance, even though most low-level details are usually handled by either the browser or the web server.

Chapter 1: Latency is the most important factor in networking performance, usually mattering more than bandwidth. Several things contribute to latency: the first is propagation time, which is close to the speed of light, about 1.5x slower; then there are delays in processing and transmission time. Humans start to notice lag when it’s over 100-200 ms. In many residential setups, the “last mile” from the ISP to the house adds a lot of latency.

Fiber optic has higher bandwidth than copper because it can multiplex over 400 wavelengths of light. This is often used in the backbone of the internet; while the backbone keeps getting faster, latency can’t beat the speed of light, so it needs other fixes like prefetching and caching.

Chapter 2: TCP gives you a reliable network, making sure that the data packets will get to the client in the same order they were sent. It starts with a three-way handshake, which always adds round-trip latency; this makes it slow to start. To deal with congestion, there are two methods: flow control, which stops the sender from overwhelming the receiver, and congestion avoidance, which tries to keep the middle nodes in the network from getting overloaded. They do this by increasing the window size exponentially; it starts off slow but speeds up until a packet is lost. So, a TCP connection will be slower for a few hundred milliseconds after it’s created, and that’s why you should reuse the same TCP connection whenever you can.

There are protocols for adjusting the window size both in default situations and when packets get lost; these algorithms are always being improved, so keep your kernel version updated. TCP automatically resends any lost packets, so all the application sees is some variation in latency when this happens.

Chapter 3: UDP is pretty much just a thin layer over the IP protocol; one of the challenges it faces is Network Address Translation (NAT). NAT was originally supposed to be a temporary fix for the issue of not having enough IP addresses, but now it’s a permanent part of network infrastructure. This causes problems for UDP because UDP is stateless, while NAT is stateful as it needs devices to keep track of the mapping between public and multiple private IPs. A lot of NAT devices handle this by setting a timeout, so you often need to keep the connection alive with a “keep-alive” packet.

When you’re behind NAT, you can’t see your own public IP address and port, so you need to ask a STUN server for this info. In some cases, if the NAT setup is complicated, you might need to use a TURN relay as a last-resort fallback. UDP also doesn’t manage congestion, so if you need that, you’ll end up implementing a big chunk of what TCP does, but this gives you more control.

Chapter 4: Transport Layer Security (TLS) starts with its own handshake after the TCP handshake; this means you’re looking at three round trips in total, which adds a good amount of latency. So, it’s a good idea to reuse the connection whenever you can. Some servers might also cache SSL sessions to cut down on one round trip, but this uses up more server memory. After the TLS handshake, the Application Layer Protocol Negotiation (ALPN) that helps the client and server agree on which application layer protocol to use if it’s not HTTP.

For TLS to be secure, you need a Certificate Authority to sign website certificates and build a chain of trust; your browser has a list of these trusted authorities. These authorities sometimes put out revocation lists of certificates, but scaling this is problematic. Using TLS from an application is similar to just using plain TCP, but it’s a good idea to place the server closer to the client because the extra handshake makes latency even more crucial. The chapter also talks about some other server-side tricks to make TLS faster.

Chapter 5: In wireless networks, there are several types of technologies that differ based on range, bandwidth, and latency, like Wi-Fi, Bluetooth, and LTE. Bandwidth is a big deal, and according to Shannon’s theorem, doubling the frequency will also double the bandwidth. The frequency spectrum is expensive and auctioned off at high prices. When different devices transmit on the same frequency, they interfere with each other and lower the signal-to-noise ratio. Factors like this interference and the modulation algorithm affect the actual bandwidth, so real-world performance will often be less than what you’d get under ideal conditions.

Chapter 6: Wi-Fi works a lot like Ethernet in that it lets multiple devices send data into a random-access channel; it also detects collisions and retries sending data with an exponential backoff protocol if that happens, so for it to work well, the traffic load needs to be low. The power and frequency limits change depending on where you are, and device firmware makes sure you’re not breaking any regulations. The 5 GHz spectrum can handle more data than the 2.4 GHz one. Since Wi-Fi can be a bit unpredictable in terms of latency and bandwidth, apps need to adjust; for example, they might change the video quality based on the connection by measuring it periodically using adaptive bitrate sampling. Also, Wi-Fi is usually not metered like mobile data is, so it’s a good idea to offload high-bandwidth tasks over to Wi-Fi whenever you can.

Chapter 7. Mobile networks have evolved through several generations like 2G, 3G, and 4G; but each of these labels doesn’t represent a single standard, but include multiple standards that are incompatible with each other, though they set some expectations for data rates and latency. Various technologies within these generations have been adopted in different regions. LTE aims to be the long-term standard, but switching to it from other technologies comes with costs; not only does the infrastructure need to be upgraded, but mobile devices also need to support the new standard. So, a typical smartphone will have multiple radios to support different standards.

One big issue in mobile connectivity is power consumption. When the radio is at full power, it drains the battery extremely quickly, so it constantly switches between high-power and several lower-power states using a state machine. In lower power states, it listens for network broadcasts signaling an incoming message and then switches to a higher power state to retrieve the message. Because of these power states, the first packet usually has much longer latency since the device has to switch to a higher-power mode and establish a connection with the radio tower before sending any data.

Handoff negotiations also add a layer of complexity to mobile networks, especially when a device switches to a different radio tower due to movement or because a tower is overloaded. The core network sits between these radio towers and the external internet; it routes the packets and manages user information.

Chapter 8. Optimizing applications for mobile data use is important because constantly switching to a high-power state can drain the battery fast. It’s costly to issue a lot of small requests since each one requires the device to switch to a higher power state and establish a radio connection; this will quickly eat up battery life. It’s much better to batch updates together and hold off on low-priority requests until the connection is active again, and then send them all in one go. There’s often high latency, especially when the radio is idle, so the UI should respond quickly, within a few hundred milliseconds, even if network availability can’t catch up that fast. Don’t make assumptions about latency or bandwidth; always measure them frequently and adjust the bitrate if needed.

Chapter 9. The history of HTTP starts with HTTP 0.9, which is really simple; it’s more like a toy project that can be described in a few sentences and was only designed to send HTML. HTTP 1.0, proposed in 1996, took things more seriously and added multiple headers and different response types. HTTP 1.1 came out in 1999 and included a bunch of performance optimizations like keep-alive connections; it’s still widely used today. HTTP 2.0 was announced in 2012, but at the time the book was written (2013), it hadn’t been finalized yet.

Chapter 10. In web performance, the most crucial metric is page load time; basically, it’s the time from when you start loading the page until the loader stops spinning and everything is loaded. The browser has complex dependencies when loading a page, and importantly, the CSS should be loaded as early as possible since it blocks many other things. To render a single page, you often need to load hundreds of resources or over 1 MB of data, and this number is steadily increasing. Ideally, the page should load in under 100 to 200 milliseconds; any longer and it starts to become noticeable. The resource waterfall graph is useful for visualizing network performance.

To get accurate data, it’s good to grab the real timing object from the user’s browser, because synthetic tests aren’t usually that realistic. Bandwidth is generally much less of an issue compared to latency; increasing the bandwidth by more than 5 Mbps hardly has any effect, while improving latency has a much greater impact. Whenever possible, use hints to assist browser-side optimizations, like hints to prefetch resources.

Chapter 11. HTTP 1.x offered a lot of performance improvements. Keep-alive connections let us use the same existing TCP connection for multiple assets, avoiding the need to start a new one each time. Pipelining allows many requests to be done in parallel, but its adoption has been limited. This is mostly because HTTP 1.x doesn’t have a way to multiplex assets, so one large resource can block all other resources from loading, known as the head-of-line blocking problem.

Browsers often open multiple parallel TCP connections to speed up resource fetching; usually, the default is six per host. Another important factor is header overhead, like metadata and cookies, which often add more overhead than the actual data. Bundling assets together might be more network-efficient, but it can sometimes backfire; for example, bundling images could lead to higher memory usage.

Chapter 12. HTTP 2.0 brings even more performance improvements and is designed so that all existing applications should work without any modifications. It uses a binary framing layer instead of plaintext, allowing data to be sent in many small binary frames; this enables multiplexing multiple resources in a single stream. Unlike HTTP 1.x, you don’t have to open multiple TCP connections or deal with the head-of-line blocking problem; you can just use a single HTTP 2.0 connection.

It also allows for sending priority hints from the browser to the server; the server can even push resources it knows the client will need later, even if the client hasn’t requested them, as long as the client initiated the request at the start. There’s also header compression, where only the differences between header fields are sent instead of sending the full header for every request.

It will take time for servers and browsers to switch from 1.1 to 2.0. If using TLS, the ALPN negotiation protocol will handle this transition; otherwise, HTTP 1.1 has an upgrade mechanism to switch to 2.0.

Chapter 13. Summarizes a bunch of performance recommendations for using either HTTP 1.x or 2.0. Many of the guidelines differ depending on which version you’re using; it also gives some tips on deploying both 1.x and 2.0 on the same server.

Chapter 14. The browser handles and manages network resources, enforces security restrictions, and provides a sandbox environment for scripts; this way, your application doesn’t have to worry about TCP and UDP connections.

Chapter 15. XMLHttpRequest (XHR) is a key technology behind Ajax that makes network calls available to JavaScript; before this technology, web pages had to be refreshed to make any network calls. Despite its name, XHR isn’t tied to XML specifically and can transmit any data format. Cross-Origin Resource Sharing (CORS) is a security mechanism where a remote server must opt-in to allow cross-origin requests; if the server doesn’t set the right header, the browser will block the request. Sometimes, a preflight request is used to prevent sending sensitive data to services that aren’t CORS-aware; the browser first sends a preflight request, and if the server responds correctly, the browser proceeds with the complete request, including sensitive header information.

XHR is a simple way to upload and download data in Blob format, and events are fired during the progress so a loading bar can be updated; however, it doesn’t support streaming data as it arrives. Therefore, the only strategy to receive updates from the server is through polling, where the client continuously checks for new updates. Long polling keeps a connection open until an update is available, which offers better latency but isn’t always more efficient, especially if updates are frequent.

Chapter 16: Server-Sent Events (SSE) use the EventSource interface, where the browser opens a connection to the server and triggers different callbacks when events are received from the server; the server must respond with an “event-stream” content type and can send plain text messages in a simple format, delimited by new lines. Although any plain text format may be sent, binary data must be encoded in base64; optionally, an ID for each packet can be included so the browser can automatically recover from dropped connections.

Chapter 17: WebSocket is a protocol that enables bi-directional communication between the client and server, supporting both binary and text data; depending on the type of data received, it is processed differently—binary data is stored in an ArrayBuffer. The “send” operation returns asynchronously and transmits events in the order they are received. The protocol has a binary framing layer, so large requests are split into multiple frames, and the application gets a callback only once the entire frame is received; however, it can’t multiplex different messages, leading to the head-of-line blocking problem with large messages. It’s recommended to split these into smaller messages, although HTTP 2.0 avoids this issue. There are some inefficiencies with WebSocket, such as difficulty with compression and caching; therefore, it’s not advised to send image data that should be cached by a CDN, and issues can arise if the connection goes through a proxy service—it’s recommended to always use a TLS tunnel.

Chapter 18: WebRTC is a protocol designed for audio and visual data over P2P connection, and the browser handles decoding with algorithms to manage network issues; a media stream can have multiple tracks of audio and visual data, serving as both an input and output source responsible for real-time data delivery. The getUserMedia function requests audio and video input from the browser; humans are sensitive to latency, but dropped frames are less noticeable, so UDP is used for its better latency, even though reliability is less crucial. Due to UDP, the protocol has its own mechanisms for NAT traversal flow control.

To establish a peer-to-peer connection, a signaling server is required to initiate the connection since direct client-to-client connections can’t be established; this is done using the SDP (Session Description Protocol) to synchronize stream metadata over a signaling channel, with the browser handling the connection details. For secure communication, a TLS handshake is needed, which is challenging over UDP, so the DTLS protocol implements a subset of the TLS handshake; then, the SRTP protocol manages media delivery over a binary channel, and there are a bunch of other protocols for control data.

In media applications, bandwidth constraints often outweigh latency concerns; upload bandwidth is usually much lower than download bandwidth, and a single HD stream can consume all available bandwidth. This creates challenges for multi-party architectures, as there isn’t enough bandwidth to stream from multiple sources simultaneously, like in a group call, making them more complicated to build than simple peer-to-peer audio-visual apps and often requiring a proxy server architecture.

Share this:

Most similar books:

Leave a Reply Cancel reply