The weirdly obscure art of Streamed HTML

tigt

Taylor Hunt

Posted on March 15, 2022

The weirdly obscure art of Streamed HTML

My goal from last time: reuse our existing APIs in a demo of the fastest possible version of our ecommerce website… and keep it under 20 kilobytes.

I decided this called for an MPA. (aka a traditional web app. Site. Thang. Not-SPA. Whatever.)

And with that decision, I doomed the site to feel slow and clunky

In theory, there’s no reason MPA interactions need be as slow as commonly encountered. But in practice, there are many reasons.

Here’s an example. At Kroger, product searches take two steps:

  1. Send user’s query to an API to get matching product codes
  2. Send those product codes to an API to get product names, prices, etc.

Using those APIs to generate a search results page would look something like this:



const resultsData = fetch(`/api/search?${new URLSearchParams({
    query: usersSearchString
  })}`)
  .then(r => r.json())
  .then(({ upcs }) =>
    fetch(`/api/products/details?${new URLSearchParams({ upcs })}`
  )
  .then(r => r.json())

res.writeHead(resultsData.success ? 200 : 500, {
  'content-type': 'text/html;charset=utf-8'
})

const htmlResponse = searchPageTemplate.render({
    searchedQuery: usersSearchString,
    results: resultsData
  })

res.write(htmlResponse)
res.end()


Enter fullscreen mode Exit fullscreen mode

Each fetch takes time, and /api/products/details only happens after /api/search finishes. Plus, those requests traveled from my computer to the datacenter and back. A real server making those calls would sit next to the others for very fast requests.

But on my demo machine, the calls usually took ~200 milliseconds, sometimes spiking as high as 800ms. Combined with the target 3G network, server processing time, and other inconvenient realities, I was frequently flouting the 100–1000ms limit for “natural and continuous progression of tasks”.

So the problem is high Time to First Byte, huh?

No worries! High Time to First Byte (TTFB) is a known performance culprit. Browser devtools, Lighthouse, and other speed utensils all warn about it, so there’s lots of advice for fixing it!

Except, none of the easily-found advice for improving TTFB helps:

Optimize the server application to prepare pages faster

Node.js spent 30ms or less handling the request and sending HTML. Very little to be gained there.

Optimize database queries or migrate to faster database systems

I was not allowed to touch our databases or API servers.

Upgrade server hardware to have more memory or CPU

This ran on a MacBook Pro with plenty of unused RAM and CPU.

Cache the expensive lookups

Caching can’t help the first requests, unless I precache all known endpoints or something. Even then, that wouldn’t work for search: users can and will search for strings never seen before.

The problem: web performance is other people

If only two API calls were a struggle, I was in for ruination. Here are some data sources our homepage uses:

  • Authentication and user info
  • Items in shopping cart
  • Selected store, pickup vs. delivery, etc.
  • Recommended products
  • Products on sale
  • Previous purchases
  • Sponsored products
  • Location-specific promotions
  • Recommended coupons
  • A/B tests
  • Subscription to Kroger Boost
  • …and so on. You get it, there’s a lot — and that’s only the stuff you can see.

Like many large companies, each data source may be owned by different teams. With their own schedules, SLAs, and bugs.

A unamused man in front of a whiteboard, upon which scads of shapes with silly labels like “Magic Baby” and “Hell Proxy”, festooned by arrows pointing every which way.

After you see real API charts, Krazam’s satirical microservices diagram gets either more or less funny. Still figuring out which.

Let’s say the 10 data sources I listed are each one API call. What are the odds my server can respond quickly enough?

Let’s say 1 client request creates 10 downstream requests to a longtail-latency affected subsystem. And assume it has a 1% probability of responding slowly to a single request. Then the probability that at least 1 of the 10 downstream requests are affected by the longtail latencies is equivalent to the complement of all downstream requests responding fast (99% probability of responding fast to any single request) which is:

1(0.99)10=0.095 1 - (0.99)^{10} = 0.095

That’s 9.5 percent! This means that the 1 client request has an almost 10 percent chance of being affected by a slow response. That is equivalent to expecting 100,000 client requests being affected out of 1 million client requests. That’s a lot of members!

Who moved my 99th percentile latency?

And since users visit multiple pages in MPAs, the chances of suffering a high TTFB approaches “guaranteed”:

Gil walks through a simple, hypothetical example: a typical user session involves five page loads, averaging 40 resources per page. How many users will not experience something worse than the 95th percentile? 0.003%.

Everything You Know About Latency Is Wrong

I believe this is why Kroger.com used a SPA in the first place — if disparate teams’ APIs can’t be trusted, at least they won’t affect other teams’ code. (Similar insulation from other teams’ components is probably one reason for React’s industry adoption.)

The solution: streamed HTML

It’s easier to show than to explain:

Both pages show search results in 2.5 seconds. But they sure don’t feel the same.

Not all sites have my API bottlenecking issue, but many have its cousins: database queries and reading files. Showing pieces of a page as data sources finish is useful for almost any dynamic site. For example…

  • Showing the header before potentially-slow main content
  • Showing main content before sidebars, related posts, comments, and other non-critical information
  • Streaming paginated or batched queries as they progress instead of big expensive database queries

Beyond the obvious visual speedup, streamed HTML has other benefits:

Interactive ASAP

If a user visits the homepage and immediately tries searching, they don’t have to wait for anything but the header to submit their query.

Optimized asset delivery

Even with no <body> to show, you can stream the <head>. That lets browsers download and parse styles, scripts, and other assets while waiting for the rest of the HTML.

Less server effort

Streamed HTML uses less memory. Instead of building the full response in RAM, it sends generated bytes immediately.

More robust and faster than incremental updates via JavaScript

Fewer roundtrips, happens before/while JS boots, immune to JS errors and the other reasons 1% of visits have broken JavaScript

And because it’s more efficient, that leaves more CPU and RAM for the JavaScript we do run, not to mention painting, layout, and user interactions.

But don’t take my word for it:

Hopefully, you see why I considered HTML streaming a must.

And that’s why not Svelte

Previously…

Maybe if I sprinkled the HTML with just enough CSS to look good… and if I had any room left, some laser-focused JavaScript for the pieces that benefit most from complex interactivity.

That’s exactly what Svelte excels at. So why didn’t I use it?

Because Svelte does not stream HTML. (I hope it does someday.)

If not Svelte, then what?

I found only 2 things on NPM that could stream HTML:

  1. Dust, a template language that seems to have died twice.
  2. Marko, some library with an ungoogleable name and a rainbow logo… oh, and JSX-like syntax? And a client-side virtual DOM that fit in my budget? And eBay has battle-tested it for its ecommerce websites? And it only uses client-side JS for stateful components? You don’t say.

It’s nice when a decision makes itself.

And thus, Marko.

Marko’s <await> made streaming easy

Marko streams HTML with its <await> tag. I was pleasantly surprised at how easily it could optimize browser rendering, with all the control I wanted over HTTP, HTML, and JavaScript.

Disclaimer
I now work for eBay, but I didn’t yet when I wrote this post.

Buffered pages don’t show content as it loads, but Marko’s streaming pages show content incrementally.

Source: markojs.com/#streaming

As seen in Skeleton screens, but fast:



<SiteHead />

<h1>Search for “${searchQuery}”</h1>

<div.SearchSkeletons>
  <await(searchResultsFetch)> <!-- stalls the HTML stream until the API returns search results -->
    <@then|result|>
      <for|product| of=result.products>
        <ProductCard product=product />
      </for>
    </@then>
  </await>
</div>


Enter fullscreen mode Exit fullscreen mode

<await> for nice-to-haves

Imagine a component that displays recommended products. Fetching the recommendations is usually fast, but every once in a while, the API hiccups. <await>’s got your back:



<await(productRecommendations)
    timeout=50> <!-- wait up to 50ms -->
  <@then|recs|>
    <RecommendedProductList of=recs />
  </@then>

  <@catch>
    <!-- don’t render anything; no big deal if this fails -->
  </@catch>
</await>


Enter fullscreen mode Exit fullscreen mode

If you know how much money product recommendations make, you can fine-tune the timeout so the cost of the performance hit never exceeds that revenue.

And that’s not all!



<await(productRecommendations) client-reorder>
  <@placeholder>
    <!-- immediately render placeholder to prevent content jumping around -->
    <RecommendedProductsPlaceholder /> 
  </@placeholder>

  <@then|recs|>
    <RecommendedProductList of=recs />
  </@then>
</await>


Enter fullscreen mode Exit fullscreen mode

The client-reorder attribute turns the <await> into an HTML fragment that doesn’t delay the rest of the page behind it, but asynchronously renders when ready. client-reorder requires JavaScript, so you can weigh the tradeoffs of using it vs. a timeout with no fallback. (I think you can even combine them.)

That’s how Facebook’s BigPipe renderer worked, which once lived on the same page as React. Wouldn’t it be nice to have the best of both?

Let me tell you: it is nice.

Marko’s <await> is awesome

Best of all, these <await> techniques are Marko’s golden path — heck, its very reason for being. Marko has stream control no other renderer makes easy, a way to automatically upgrade streamed HTML with JavaScript, and 8+ years of experience with the inevitable bugs and edge cases.

(Yes, I was quite taken with Marko. Let me have my fun.)

However, the fact that Marko was apparently my one option does raise a certain question…

Why is HTML streaming not common?

Or in the words of another developer after my demo: “if Chunked Transfer-Encoding is so useful, how come I’ve never heard of it?”

That is a very fair question. It’s not because it’s poorly-supported — HTML rendered progressively in Netscape 1.0. Beta Netscape 1.0. And it’s not because the technique is barely-used — Google search results stream after the top navbar, for instance.

I think one reason is the inconsistent name

  • Steve Souders called it “early flushing”, which is not… the best name.
  • “Chunked transfer-encoding” is the most unique, but it’s only in HTTP/1.1. HTTP/2, HTTP/3, and even HTTP/0.9 stream differently.
  • It was known as “HTTP streaming” before HLS, DASH, and other forms of video-over-HTTP took up that mindspace.
  • The catch-all term is “progressive rendering”, but that applies to many other things: interlaced images, visualizing large datsets, video game engine optimizations, etc.

Many languages/frameworks don’t care for streaming

Older languages/frameworks have long been able to stream HTML, but were never really good at it. Some examples:

PHP 🐘

Requires calling inscrutable output-buffering functions in a finicky order.

Ruby on Rails 🛤

ActionController::Streaming has a lot of caveats. In particular:

This approach was introduced in Rails 3.1 and is still improving. Several Rack middlewares may not work and you need to be careful when streaming. Those points are going to be addressed soon.

Rails hit 3.1 in 2011. There was clearly not much demand to address those points.

(Rails’ modern way is Turbo Streams, but those need JS to render, so not the same thing.)

Django 🐍

Django really doesn’t like streaming at all:

StreamingHttpResponse should only be used in situations where it is absolutely required that the whole content isn’t iterated before transferring the data to the client.
Perl 🐪

Perl’s autostream behavior is controlled by a $| variable (yes, that’s a pipe), but that sort of nonsense is normal for it. God I love Perl.

Because streaming was never their default happy path, languages/frameworks considered it a last resort where you gained performance at the expense of the “real” render features. Here’s a telling quote:

You can still write ASP.NET pages that properly stream data to the browser using Response.Write and Response.Flush. But you can’t do it within the normal ASP.NET page lifecycle. Maybe this is a natural consequence of the ASP.NET abstraction layer.

Regardless, it still sucks for users.

The Lost Art of Progressive HTML Rendering

Node.js is a happy exception. As proudly described on Node’s About page:

HTTP is a first-class citizen in Node.js, designed with streaming and low latency in mind.

Despite that, the “new” hot JavaScript frameworks have been struggling to stream for a while:

These frameworks have the staff, funding, and incentives to make streaming work, so the holdup must be something else. Maybe it’s hard to retrofit streaming onto their abstractions, especially without ruining established third-party integrations.

Streaming rarely mentioned as a TTFB fix

As mentioned near the beginning, when high TTFB is detected, streaming is almost never suggested as a fix.

I think that’s the biggest problem. A Web API with a bad name can become popular if it’s mentioned enough.

Personally, I’ve only seen streaming HTML recommended for TTFB once, and it’s in chapter 10 of High-Performance Browser Networking. In an aside. At the bottom.

(Inside a <details> labeled “Beware of The Leopard”.)

So that’s one silver bullet down

I had streaming HTML, but that was no substitute for the other 999 lead bullets to back it up. Now I had to… make the website.

You know, write the components, style the design, build the features. How hard could that be? (Hint: people are paid to do those things.)

💖 💪 🙅 🚩
tigt
Taylor Hunt

Posted on March 15, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related