Skymood - Watch Bluesky's heartbeat through emojis in real-time 🌟

skorfmann

Sebastian Korfmann

Posted on November 18, 2024

Skymood - Watch Bluesky's heartbeat through emojis in real-time 🌟

This is a brief background story to how I built Skymood over the course of two evenings.

Since becoming more active on bluesky again, I started to look into how bluesky is built behind the scenes. The backbone which ties everything together is the Firehose, which is a full stream of events (posts, likes, follows, handle changes, etc). While that's one of the coolest aspects of Bluesky and ATProto, it's also a ton of data (in the realm of ~ 50 GB / day as of today) which would have to be transferred and processed, since it's an all or nothing approach with no filtering option.

However, there's another option which got released a few weeks ago: Jetstream. In contrast to the Firehose, this allows filtering by Collection NSIDs and Repositories. This means, we can filter for e.g. all posts, either globally or scope it to a bunch of given user ids. That sounds pretty intriguing, doesn't it?

As one does, I started to play around with just getting bunch of posts and a way to demonstrate what's possible. When looking at the stream of posts flying by I was like, it would be nice to get some kind of moodboard based on the emojis. One prompt later:

Image description

Well, that's promising, so let's evolve that into a website we can put on the Internet.

Literally Serverless

The Jetstream Websocket endpoint is open and can be connected to from any browser. So technically, there's no reason why a server would be required. And sure enough, consuming the Jetstream filtered for posts is totally doable.

Just give it a try in your terminal

$ websocat wss://jetstream2.us-east.bsky.network/subscribe\?wantedCollections=app.bsky.feed.post
Enter fullscreen mode Exit fullscreen mode

So I went ahead and threw together a quick React SPA using Waku which was subscribing straight to the Jetstream Websocket from the client side. The prototype was done pretty quickly. The only noticeable hurdle was, that the Jetstream SDK depends on node:events. Rather than trying to workaround that, it seemed a lot simpler to just go with a plain Websocket implementation.

  const ws = new WebSocket('wss://jetstream2.us-west.bsky.network/subscribe?wantedCollections=app.bsky.feed.post');

  ws.addEventListener('message', async (event) => {
    console.log(event)
  })
Enter fullscreen mode Exit fullscreen mode

I was kind of expecting some performance issues, but it was rather smooth. The only small optimization was to debounce rendering, so that not each and every new emoji would re-render.

Deployed this to Cloudflare and pretty much done. Well, not so fast. Let's look at the consumed bandwidth.

$ websocat wss://jetstream2.us-east.bsky.network/subscribe\?wantedCollections=app.bsky.feed.post | pv > /dev/null
Enter fullscreen mode Exit fullscreen mode

Depending on the time of the day, this is currently using between ~ 60 KB/s and 150 KB/s for the global posts stream - and it's gonna increase by the day. Not really an issue on a fast, unmetered fibre connection, but on mobile it might be another story. Last but not least, it feels wrong to let the client do all the expensive work while potentially increasing the bandwidth & resources bill for the Bluesky team. Back to the drawing board.

Cloudflare Durable Objects

Cloudflare's Durable Objects were on my list to try for a long time, handling Websockets are one of major use-cases. Rewriting the React client side handling was just a few prompts away and I was good to go.

The main changes were to subscribe to Jetstream as an upstream connection, match for emojis and publish the emojis to subscribed clients.

// Track connected WebSocket clients and their emoji filters
const clients = new Map<WebSocket, Set<string>>();

// When receiving a message from the data source
ws.addEventListener('message', async (event) => {
  try {
    const data = JSON.parse(event.data as string);
    const postText = data.text.toLowerCase();

    // Extract unique emojis from the text
    const emojiRegex = /[\p{Emoji_Presentation}\p{Extended_Pictographic}]/gu;
    const emojis = [...new Set(postText.match(emojiRegex) || [])];

    // Skip if no emojis found
    if (emojis.length === 0) return;

    // Notify all connected clients
    for (const [client, filters] of clients) {
      // Send the list of emojis found
      client.send(JSON.stringify({
        type: 'emojis',
        emojis
      }));

      // If client has emoji filters and post contains matching emoji,
      // send the full post details
      if (filters.size > 0 && emojis.some(emoji => filters.has(emoji))) {
        client.send(JSON.stringify({
          type: 'post',
          text: data.text,
          emojis,
          timestamp: Date.now()
        }));
      }
    }
  } catch (error) {
    console.error('Error processing message:', error);
  }
});
Enter fullscreen mode Exit fullscreen mode

And with that, we were down to somewhere between 0.5 - 3 KB/s for a client connection while the server is doing the heavylifting only once. That's a lot better!

However, while the Durable Object was doing ok it seemed to be a bit slow (gut feeling, no data) and close the maximum memory of 128 MB (see limits). What to do? So far it's a single Durable Object. Ideas from the forum / Discord of Cloudflare were along the lines of sharding, aka introduce a few durable objects for client handling while a single one does the upstream processing. That's a lot of complexity right there. I'm sure there are better ways, if anyone at Cloudflare reads this: I'd be more than happy to iterate on my approach. But for now I just wanted to ship it. So off to the next chapter.

Pivot: Bun.js

Bun.js is another thing I wanted to look into for a while. In the back of my head it's categorized as Nodejs, but more performant. It turned out that Bun has a custom websocket server baked in, which claims 7x more throughput compare to Nodejs and ws. Haven't verified those claims, but that's enough of an excuse to use it in this case.

Again, a few prompts later the new version was up and running and deployed to fly on a rather small machine in Frankfurt, Germany. Let's see:

$ websocat wss://skymood-bun.fly.dev
{"type":"clientCount","count":1}
{"type":"emojis","emojis":["😘"]}
{"type":"emojis","emojis":["🧐"]}
{"type":"emojis","emojis":["🙏"]}
{"type":"emojis","emojis":["😀"]}
{"type":"emojis","emojis":["😂"]}
...
Enter fullscreen mode Exit fullscreen mode

and the filtered version

$ (echo '{"type":"filter","emoji":"🥰"}'; cat) | websocat wss://skymood-bun.fly.dev
{"type":"emojis","emojis":["😊"]}
{"type":"emojis","emojis":["‼","✨"]}
{"type":"emojis","emojis":["🥰"]}
{"type":"post","text":"Good morning have a lovely day 🥰","url":"https://bsky.app/profile/did:plc:q4st6fcbn5jdi7xdf4o7a2jy/post/3lb7jrngi7c2m","timestamp":1731919511338,"emojis":["🥰"]}
{"type":"emojis","emojis":["🌻","🔪"]}
{"type":"emojis","emojis":["❤"]}
Enter fullscreen mode Exit fullscreen mode

Beautiful :)

Quite a journey, but we now have a rather robust Websocket relay, which connects to the Bluesky Jetstream once, and republishes only a small subset of only relevant messages. Find the current server code over at Github

Make sure to check out the end result https://skymood.skorfmann.com/ and don't forget to share this post, the website or both :) Also, post comments either here or on this Bluesky post

💖 💪 🙅 🚩
skorfmann
Sebastian Korfmann

Posted on November 18, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related