Chromium Spelunking: Getting Started

djmitche

djmitche

Posted on April 24, 2023

Chromium Spelunking: Getting Started

Introduction

I'm now working day-to-day on Chromium, the open-source code-base behind Chrome. It's a C++ codebase with a healthy dose of Java and JS thrown in, although I'm mostly in the C++ bits of the codebase. It's been a steep learning curve, and I'd like to start documenting that learning curve with a few goals in mind:

  • Structure my own thinking and learning about Chromium.
  • Help others who are beginning to work on Chromium.
  • Start to identify some ways that Chromium could improve its approachability.

That term will become a theme. An approachable codebase is one where it is easy for a newcomer to get started. I think this is an important aspect of any codebase, but especially open source codebases. Developers move from project to project all the time, as I just did. An approachable codebase lets a developer get going quickly. It benefits existing developers, too: if new developers can find answers to questions, then more experienced developers don't have to spend time answering those questions.

In the last few weeks, several of my questions have garnered answers of the form "I don't work on Chromium anymore, but ..." While I feel for these experienced engineers haunted by the ghosts of projects past, if the codebase was more approachable then I wouldn't have to ask them!

What to Expect

I'll be blogging about my spelunking as it happens. Think of this as a lightly-edited lab notebook: all of the false starts, incorrect assumptions, and missed connections are here for you to see. I'll try to use complete sentences, explain things clearly, and organize my thoughts into a coherent order within each post.

The Task

The project I'm gearing up for involves adding some additional functionality to Chromium's network stack, to allow it to proxy QUIC connections over other QUIC connections. It's OK if you don't know what that means just yet -- I only have the vaguest sense myself. For the moment, I need to know how the network implementation is put together, so that I can see what parts I will need to modify.

Big Picture

My first step is to get the "big picture": the major components and patterns. With this information, I can start looking at the details, confident that I understand where those details fit. This is a "top-down" approach. And I typically begin by looking for developer documentation. In the case of the Chromium network layer, I found the following:

A few general lessons from the coding patterns:

  • Lots of functions in the stack's API use variants of the libc return value style: negative numbers are error codes, positive numbers indicate success, perhaps as a byte count, and zero can indicate simple success or EOF, depending on context.
  • Some functions can either finish synchronously or asynchronously. As a baseline, a C syscall like write(2) will return EAGAIN when it would otherwise block, and the expectation is that it will be called again when the application believes the write might succeed. This would continue until the call is successful. The Chromium functions all take a callback, and it's unclear from this description whether a synchronous completion invokes this callback, or expects the caller to do so. So, I'll need to figure that out, and document it.
  • It's common for network types to be structured as state machines, with a DoLoop at the core calling DoXxx methods based on a next_state_ instance variable until either it must wait for something to complete (ERR_IO_PENDING) or the request is complete. There's a bit more data about what is responsible for calling the callback which I'll need to explore as I start reading the code.

The "Life of a URLRequest" document is long and detailed, so I'll save that and the proxy document for the next post.

💖 💪 🙅 🚩
djmitche
djmitche

Posted on April 24, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related