How does the internet work? Part 1

kavya-sahai-god

Kavya Sahai

Posted on October 5, 2024

How does the internet work? Part 1

Ever wonder what happens when you click a link? 🌐 How The Internet Works takes you behind the scenes of the digital world, breaking down complex tech into simple, bite-sized insights. From data packets to servers and beyond, discover the magic that powers your online experience! (Hook written with the help of AI, because I can't :D)

What happens when you go to google.com?

The "g" key is pressed

Let me explain the physical keyboard actions and the OS interrupts. When you press the "g" key, the browser registers the event, triggering the auto-complete functions. Based on your browser's algorithm and whether you're in regular or private/incognito mode, various suggestions appear in a dropdown beneath the URL bar.

These suggestions are typically prioritized and sorted using factors such as your search history, bookmarks, cookies, and popular internet searches. As you continue typing "google.com," numerous processes run in the background, and the suggestions refine with each keystroke. The browser might even predict "google.com" before you've finished typing.

Autocomplete Sequence
Browsing Autocomplete Sequences

The "enter" key bottoms out

To establish a starting point, let's consider the Enter key on a keyboard when it reaches the bottom of its travel range. At this moment, an electrical circuit dedicated to the Enter key is closed (either mechanically or capacitively), allowing a small current to flow into the keyboard's logic circuitry. This circuitry scans the state of each key switch, filters out electrical noise from the rapid closure of the switch (debouncing), and translates the action into a keycode—in this case, the integer 13. The keyboard controller then encodes this keycode for transmission to the computer. Today, this is almost always done over a Universal Serial Bus (USB) or Bluetooth connection, though older systems used PS/2 or ADB.

In the case of a USB keyboard:

  • The keyboard is powered by a 5V supply delivered through pin 1 of the computer's USB host controller.
  • The keycode generated by the keypress is stored in an internal register known as the "endpoint."
  • The USB host controller polls this "endpoint" roughly every 10ms (the minimum interval set by the keyboard), retrieving the stored keycode.
  • The keycode is sent to the USB Serial Interface Engine (SIE), where it is converted into one or more USB packets in accordance with the USB protocol.
  • These packets are transmitted over the D+ and D- lines (the two middle pins) at a maximum rate of 1.5 Mb/s, as the keyboard is classified as a "low-speed device" (per USB 2.0 standards).
  • The computer's host USB controller decodes this serial signal, and the Human Interface Device (HID) driver interprets the keypress. Finally, the key event is passed to the operating system's hardware abstraction layer. Sequence Diagram in case of USB Keyboard Sequence Diagram

In the case of a virtual keyboard (such as on touch screen devices):

  • When the user touches a capacitive touch screen, a small amount of current transfers to their finger. This interaction disturbs the electrostatic field of the screen’s conductive layer, creating a voltage drop at the point of contact.
  • The screen controller detects this and triggers an interrupt, reporting the coordinates of the touch.
  • The operating system then alerts the currently active application that a press event has occurred within its graphical interface, typically on a virtual keyboard button.
  • The virtual keyboard application raises a software interrupt, which notifies the operating system of a "key pressed" event.
  • The focused application receives this notification and processes the keypress accordingly. Sequence Diagram in case of a Virtual Keyboard. Sequence Diagram Describing the same

Interrupt Fires [Not for USB Keyboards]

For non-USB keyboards, such as those using legacy connections (e.g., PS/2), the keyboard signals an interrupt via its interrupt request line (IRQ). This IRQ is mapped to an interrupt vector (an integer) by the system's interrupt controller. The CPU consults the Interrupt Descriptor Table (IDT), which links each interrupt vector to a corresponding function known as an interrupt handler, supplied by the operating system’s kernel.

When the interrupt is triggered, the CPU uses the interrupt vector to index into the IDT and execute the appropriate interrupt handler. This process causes the CPU to transition into kernel mode, allowing the operating system to manage the keypress event.

A WM_KEYDOWN Message is Sent to the App (On Windows)

When the Enter key is pressed, the Human Interface Device (HID) transport passes the key down event to the KBDHID.sys driver, which converts the HID usage data into a scan code. In this case, the scan code is VK_RETURN (0x0D), representing the Enter key. The KBDHID.sys driver then communicates with the KBDCLASS.sys driver (the keyboard class driver), which securely manages all keyboard input. Before proceeding, the signal may pass through any third-party keyboard filters installed on the system, though this also happens in kernel mode.

Next, Win32K.sys comes into play, determining which window is currently active by invoking the GetForegroundWindow() API. This function retrieves the window handle (hWnd) of the active application, such as the browser’s address bar. At this point, the Windows "message pump" calls SendMessage(hWnd, WM_KEYDOWN, VK_RETURN, lParam). The lParam parameter contains a bitmask that provides additional information about the keypress, including:

  • Repeat count (which is 0 in this case),
  • Scan code (which might be OEM-specific but typically standard for VK_RETURN),
  • Extended key flags (indicating whether modifier keys like Alt, Shift, or Ctrl were also pressed, which they weren’t).

The SendMessage API queues the message for the specific window handle. Later, the system’s main message processing function (known as WindowProc) assigned to the window (hWnd) retrieves and processes messages in the queue.

The active window in this case is an edit control, and its WindowProc function has a message handler that responds to WM_KEYDOWN events. The handler checks the third parameter (wParam) passed by SendMessage, recognizes that the value is VK_RETURN, and thus determines that the user has pressed the Enter key. This triggers the appropriate response for the application.

A KeyDown NSEvent is Sent to the App (On OS X)

When a key is pressed on OS X, the interrupt signal triggers an event in the I/O Kit keyboard driver (a kernel extension or "kext"). This driver translates the hardware signal into a key code. The key code is then passed to the WindowServer, which manages the graphical user interface.

The WindowServer dispatches the key press event to the appropriate applications (such as the active or listening ones) by sending it through their Mach port, where it is placed into an event queue. Applications with the proper privileges can access this event queue by calling the mach_ipc_dispatch function.

Most applications handle this process through the NSApplication main event loop, which is responsible for processing user input. When the event is a key press, it is represented as an NSEvent of type NSEventTypeKeyDown. The application then reads this event and responds accordingly, triggering any code related to keypress actions based on the key code received.

The Xorg Server Listens for Keycodes (On GNU/Linux)

When a key is pressed in a graphical environment using the X server, the X server employs the evdev (event device) driver to capture the keypress event. The keycode from the physical keyboard is then re-mapped into a scancode using X server-specific keymaps and rules.

Once the mapping is complete, the X server forwards the resulting scancode to the window manager (such as DWM, Metacity, i3, etc.). The window manager, in turn, sends the character or key event to the currently focused window. The graphical API of the focused window processes this event and displays the corresponding symbol in the appropriate field, using the correct font, based on the key pressed.

This flow ensures that the character is correctly rendered in the active application’s interface, completing the keypress interaction from hardware to graphical output.

Parse URL

When the browser parses the URL(Uniform Resource Locator), it extracts the following components:

  • Protocol: "http" The browser understands that this uses the Hyper Text Transfer Protocol to communicate with the server.
  • Resource: "/" This indicates that the browser should retrieve the main (index) page of the website, as the / path typically refers to the root or home page of the server.

Each of these components helps the browser interpret and fetch the desired resource from the web.

URL Parsing

Is it a URL or a Search Term?

When no protocol (e.g., "http") or valid domain name is provided, the browser interprets the text in the address bar as a potential search term. Instead of trying to resolve it as a URL, the browser forwards the text to its default web search engine.

In most cases, the browser appends a special identifier to the search query, indicating that the request originated from the browser's URL bar. This allows the search engine to handle and prioritize these searches accordingly, improving the relevance of the results based on the context.

This process helps the browser determine whether it should attempt to navigate directly to a website or provide search results based on the entered text.

Convert Non-ASCII Unicode Characters in the Hostname

  • The browser examines the hostname for any characters that fall outside the ASCII range, specifically those that are not in the sets of a-z, A-Z, 0-9, -, or ..
  • In this case, the hostname is google.com, which contains only ASCII characters, so no conversion is necessary. However, if there were non-ASCII characters present in the hostname, the browser would apply Punycode encoding to convert the hostname into a valid ASCII representation. This process ensures that all characters in the hostname can be correctly processed by the network protocols.

Check HSTS List

The browser first checks its preloaded HSTS (HTTP Strict Transport Security) list, which contains websites that have explicitly requested to be accessed only via HTTPS.

If the requested website is found on this list, the browser automatically sends the request using HTTPS rather than HTTP. If the website is not in the HSTS list, the initial request is sent via HTTP.

It’s important to note that a website can still implement HSTS without being included in the preloaded list. In such cases, the first HTTP request made by the user will return a response instructing the browser to only send subsequent requests via HTTPS. However, this initial HTTP request could expose the user to a downgrade attack, where an attacker might intercept the request and force it to remain unencrypted. This vulnerability is why modern web browsers include the HSTS list, enhancing security for users by preventing insecure connections from being established in the first place.

DNS Lookup

The browser begins the DNS lookup process by checking if the domain is already present in its cache. (To view the DNS cache in Google Chrome, navigate to chrome://net-internals/#dns.)

If the domain is not found in the cache, the browser calls the gethostbyname library function (the specific function may vary depending on the operating system) to perform the hostname resolution.

  1. Local Hosts File Check:

    • The gethostbyname function first checks if the hostname can be resolved by referencing the local hosts file, whose location varies by operating system. This file is a simple text file that maps hostnames to IP addresses and can provide a quick resolution without querying DNS.
  2. DNS Server Request:

    • If the hostname is not cached and cannot be found in the hosts file, the browser then sends a request to the DNS server configured in the network stack. This server is typically the local router or the ISP's caching DNS server, which stores previously resolved names to speed up future requests.
  3. ARP Process for DNS Server:

    • If the DNS server is on the same subnet, the network library follows the ARP (Address Resolution Protocol) process to resolve the IP address of the DNS server, ensuring that the request is directed correctly within the local network.
    • If the DNS server is on a different subnet, the network library instead follows the ARP process for the default gateway IP, which acts as an intermediary to route the request to the appropriate subnet.

This systematic approach ensures that the browser efficiently resolves domain names to IP addresses, enabling it to establish a connection to the desired website. By checking the cache first, using the local hosts file, and finally querying the DNS server, the browser minimizes the time spent on hostname resolution.

Sequence of DNS Lookup

Sequence Diagram

ARP Process

In order to send an ARP (Address Resolution Protocol) broadcast, the network stack library needs two key pieces of information: the target IP address that needs to be looked up and the MAC address of the interface that will be used to send out the ARP broadcast.

Checking the ARP Cache:

The ARP cache is first checked for an entry corresponding to the target IP address. If an entry exists, the library function returns the result in the format:
Target IP = MAC.

If the Entry is Not in the ARP Cache:

If there is no entry for the target IP address, the following steps are taken:

  • The route table is consulted to determine whether the target IP address is on any of the subnets listed in the local route table.
    • If it is found, the library uses the interface associated with that subnet.
    • If not, the library defaults to using the interface that connects to the default gateway.
  • The MAC address of the selected network interface is then retrieved. #### Sending the ARP Request:

The network library constructs and sends a Layer 2 (data link layer of the OSI model) ARP request with the following format: ARP Request:

  • Sender MAC: interface:mac:address:here
  • Sender IP: interface.ip.goes.here
  • Target MAC: FF:FF:FF:FF:FF:FF (Broadcast)
  • Target IP: target.ip.goes.here

Depending on the hardware setup between the computer and the router, the behavior of the ARP request varies:

Directly Connected:

If the computer is directly connected to the router, the router will respond with an ARP Reply (see below).

Hub:

If the computer is connected to a hub, the hub will broadcast the ARP request out of all its other ports. If the router is connected to the same "wire," it will respond with an ARP Reply (see below).

Switch:

If the computer is connected to a switch, the switch will check its local CAM/MAC table to identify which port has the MAC address being queried. If the switch has no entry for the MAC address, it will rebroadcast the ARP request to all other ports. If the switch does have an entry in its MAC/CAM table, it will send the ARP request only to the port that has the corresponding MAC address.

  • If the router is on the same "wire," it will respond with an ARP Reply (see below).

ARP Reply:

The ARP reply will have the following format:

Sender MAC: target:mac:address:here

Sender IP: target.ip.goes.here

Target MAC: interface:mac:address:here

Target IP: interface.ip.goes.here

Now that the network library has obtained the IP address of either the DNS server or the default gateway, it can resume its DNS process:

  1. The DNS client establishes a socket connection to UDP port 53 on the DNS server, utilizing a source port above 1023.
  2. If the response size exceeds the UDP limit, TCP will be used instead to accommodate the larger response.
  3. If the local or ISP DNS server does not have the requested information, it will initiate a recursive search, querying a hierarchy of DNS servers until the SOA (Start of Authority) is reached, at which point the answer is returned.

Opening of a Socket

Once the browser receives the IP address of the destination server, it combines this with the port number specified in the URL (where HTTP defaults to port 80 and HTTPS to port 443). The browser then makes a call to the system library function named socket, requesting a TCP socket stream using AF_INET or AF_INET6 and SOCK_STREAM.

Transport Layer Processing:

  • This request is first processed by the Transport Layer, where a TCP segment is crafted. The destination port is added to the header, and a source port is chosen from within the kernel’s dynamic port range (as specified by ip_local_port_range in Linux).

Network Layer Processing:

  • This segment is then sent to the Network Layer, which wraps it in an additional IP header. The IP addresses of both the destination server and the current machine are inserted to form a packet.

Link Layer Processing:

  • The packet next arrives at the Link Layer, where a frame header is added. This header includes the MAC address of the machine’s NIC (Network Interface Card) as well as the MAC address of the gateway (local router). If the kernel does not know the MAC address of the gateway, it must broadcast an ARP query to find it.

At this point, the packet is ready to be transmitted through one of the following methods:

  • Ethernet
  • WiFi
  • Cellular Data Network

For most home or small business Internet connections, the packet will pass from your computer, possibly through a local network, and then through a modem (Modulator/Demodulator). This modem converts digital 1’s and 0’s into an analog signal suitable for transmission over telephone, cable, or wireless telephony connections. On the other end of the connection, another modem converts the analog signal back into digital data for processing by the next network node, where the from and to addresses would be analyzed further.

In contrast, larger businesses and some newer residential connections will use fiber or direct Ethernet connections, allowing the data to remain digital and be passed directly to the next network node for processing.

Eventually, the packet will reach the router managing the local subnet. From there, it will continue to travel to the autonomous system’s (AS) border routers, traverse other ASes, and finally arrive at the destination server. Each router along the way extracts the destination address from the IP header and routes it to the appropriate next hop. The time to live (TTL) field in the IP header is decremented by one for each router that processes it. The packet will be dropped if the TTL field reaches zero or if the current router has no space in its queue (which may occur due to network congestion).
This send and receive process happens multiple times following the TCP connection flow:

  1. The client chooses an Initial Sequence Number (ISN) and sends a packet to the server with the SYN bit set to indicate it is setting the ISN.
  2. The server receives the SYN and, if it is agreeable, performs the following:
    • Chooses its own initial sequence number.
    • Sets the SYN bit to indicate it is choosing its ISN.
  3. Copies the (client ISN + 1) to its ACK field and adds the ACK flag to indicate it is acknowledging receipt of the first packet.

  4. The client acknowledges the connection by sending a packet that:

    • Increases its own sequence number.
    • Increases the receiver acknowledgment number.
    • Sets the ACK field.
  5. Data Transfer: Data is transferred as follows:

    • As one side sends N data bytes, it increases its sequence number (SEQ) by that number.
    • When the other side acknowledges receipt of that packet (or a string of packets), it sends an ACK packet with the acknowledgment (ACK) value equal to the last received sequence from the other side.
  6. Closing the Connection: To close the connection:

    • The side initiating the closure sends a FIN packet.
    • The other side acknowledges the FIN packet and sends its own FIN.
    • The initiating side acknowledges the other side’s FIN with an ACK.

Sequence Diagram of Opening of a Socket

Opening of a Socket: Sequence Diagram

TLS Handshake

  • The client computer sends a ClientHello message to the server, which includes its Transport Layer Security (TLS) version, a list of available cipher algorithms, and compression methods.
  • In response, the server replies with a ServerHello message that specifies the TLS version, the selected cipher, the selected compression methods, and the server's public certificate signed by a Certificate Authority (CA). This certificate contains a public key that will be used by the client to encrypt the remainder of the handshake until a symmetric key can be agreed upon.
  • The client verifies the server's digital certificate against its list of trusted CAs. If trust can be established based on the CA, the client generates a string of pseudo-random bytes and encrypts this string using the server's public key. These random bytes will be used to determine the symmetric key.
  • The server decrypts the random bytes using its private key and utilizes these bytes to generate its own copy of the symmetric master key.
  • The client sends a Finished message to the server, encrypting a hash of the transmission that has occurred up to this point with the symmetric key.
  • The server generates its own hash and then decrypts the hash sent by the client to verify that it matches. If the hashes match, the server sends its own Finished message back to the client, which is also encrypted with the symmetric key.
  • From this point forward, the TLS session transmits application (HTTP) data encrypted with the agreed-upon symmetric key.

This handshake process establishes a secure connection between the client and server, ensuring that data transmitted over the connection is protected from eavesdropping and tampering.

If a Packet is Dropped

Sometimes, due to network congestion or flaky hardware connections, TLS packets may be dropped before reaching their final destination. In such cases, the sender must decide how to react. The algorithm governing this response is known as TCP congestion control. The specific implementation can vary depending on the sender, with the most common algorithms being Cubic on newer operating systems and New Reno on many others.

  • The client chooses a congestion window based on the maximum segment size (MSS) of the connection.
  • For each packet acknowledged, the congestion window doubles in size until it reaches the 'slow-start threshold.' In some implementations, this threshold is adaptive and can change based on network conditions.
  • Once the slow-start threshold is reached, the window increases additively for each packet acknowledged. If a packet is dropped, the window reduces exponentially until another packet is acknowledged.

This congestion control mechanism helps optimize network performance and stability, ensuring that data can be transmitted efficiently while minimizing the impact of packet loss.

HTTP Protocol

If the web browser used was developed by Google, instead of sending a standard HTTP request to retrieve a page, it may attempt to negotiate an "upgrade" from HTTP to the SPDY protocol with the server.

If the client is using the HTTP protocol and does not support SPDY, it sends a request to the server in the following format:

GET / HTTP/1.1
Host: google.com
Connection: close
[other headers]
Enter fullscreen mode Exit fullscreen mode

Here, [other headers] refers to a series of colon-separated key-value pairs formatted according to the HTTP specification and separated by single newlines. This assumes that the web browser is free of bugs that violate the HTTP specification and that it is using HTTP/1.1. If it were using a different version, such as HTTP/1.0 or HTTP/0.9, it might not include the Host header in the request.

HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after the response is completed. For example:

Connection: close

Enter fullscreen mode Exit fullscreen mode

HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.

After sending the request and headers, the web browser sends a single blank newline to the server to indicate that the content of the request is complete.

The server then responds with a response code that denotes the status of the request, structured as follows:

200 OK
[response headers]
Enter fullscreen mode Exit fullscreen mode

This is followed by a single newline and then the payload containing the HTML content of www.google.com. The server may either close the connection or, if requested by the headers sent by the client, keep the connection open for reuse in further requests.

If the HTTP headers sent by the web browser contained sufficient information for the web server to determine whether the version of the file cached by the web browser has been unmodified since the last retrieval (for example, if the web browser included an ETagheader), the server may instead respond with:

304 Not Modified
[response headers]
Enter fullscreen mode Exit fullscreen mode

This response will have no payload, and the web browser will retrieve the HTML from its cache.

After parsing the HTML, the web browser (and server) repeats this process for every resource (image, CSS, favicon.ico, etc.) referenced in the HTML page. In these cases, instead of GET / HTTP/1.1, the request will be structured as:

GET /$(URL relative to www.google.com) HTTP/1.1

Enter fullscreen mode Exit fullscreen mode

If the HTML references a resource on a different domain than www.google.com, the web browser returns to the steps involved in resolving the other domain, following all steps up to this point for that domain. The Host header in the request will be set to the appropriate server name instead of google.com.

HTTP Server Request Handling

The HTTPD (HTTP Daemon) server is responsible for handling requests and responses on the server side. The most common HTTPD servers include Apache and Nginx for Linux, as well as IIS for Windows.

  1. Receiving the Request: The HTTPD server receives the incoming request from the client.
  2. Breaking Down the Request: The server analyzes the request and extracts the following parameters:
    • HTTP Request Method: This could be one of several methods, including GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS, or TRACE. In the case of a URL entered directly into the address bar, the method will typically be GET.
    • Domain: In this case, the domain is google.com.
    • Requested Path/Page: Here, the requested path is /, indicating that no specific page was requested; thus, / is treated as the default path.
  3. Verifying the Virtual Host: The server checks whether a Virtual Host is configured for google.com.
  4. Method Verification: The server verifies that google.com can accept GET requests.
  5. Client Permission Check: The server checks if the client is allowed to use this method based on criteria such as IP address, authentication, etc.
  6. Request Rewriting: If the server has a rewrite module installed (such as mod_rewrite for Apache or URL Rewrite for IIS), it attempts to match the request against any configured rules. If a matching rule is found, the server rewrites the request according to that rule.
  7. Content Retrieval: The server retrieves the content that corresponds to the request. In this case, it will typically default to the index file since the request path is /. While there are cases that can override this behavior, using the index file is the most common method.
  8. File Parsing and Processing: The server parses the index file according to the designated handler. If Google is using PHP, for example, the server will utilize PHP to interpret the index file and stream the output back to the client.

By following these steps, the HTTPD server efficiently processes incoming requests and returns the appropriate responses to the client.

Browser

The primary functionality of a browser is to present the web resources you choose by requesting them from a server and displaying them in the browser window. The resource is typically an HTML document but may also include PDFs, images, or other types of content. The location of the resource is specified by the user using a URI (Uniform Resource Identifier).

The way a browser interprets and displays HTML files is defined by the HTML and CSS specifications, which are maintained by the W3C (World Wide Web Consortium), the standards organization for the web.

Browser user interfaces share many common features, including:

  • An address bar for entering a URI
  • Back and forward buttons for navigation
  • Bookmarking options for saving favorite pages
  • Refresh and stop buttons for refreshing or halting the loading of current documents
  • A home button that takes you to your home page

Browser High-Level Structure

The components of a browser can be broken down as follows:

  • User Interface: This includes the address bar, back/forward buttons, bookmarking menu, and any other part of the browser's display except for the window where the requested page is shown.
  • Browser Engine: The browser engine acts as a bridge between the user interface and the rendering engine, managing actions and interactions.
  • Rendering Engine: Responsible for displaying requested content, the rendering engine parses HTML and CSS, transforming the parsed content into a visual representation on the screen.
  • Networking: This component handles network calls, such as HTTP requests, and utilizes different implementations tailored for various platforms while providing a platform-independent interface.
  • UI Backend: The UI backend is responsible for drawing basic widgets like combo boxes and windows. It exposes a generic interface that is not specific to any platform and relies on the operating system's user interface methods.
  • JavaScript Engine: This engine parses and executes JavaScript code, allowing for dynamic content and interactivity within web pages.
  • Data Storage: This acts as a persistence layer, enabling the browser to save various types of data locally, such as cookies. Browsers also support storage mechanisms like localStorage, IndexedDB, WebSQL, and FileSystem.

Each of these components works together to create a seamless browsing experience, allowing users to access and interact with web resources efficiently.

HTML Parsing

The rendering engine begins retrieving the contents of the requested document from the networking layer, typically in 8 kB chunks. The primary responsibility of the HTML parser is to transform the HTML markup into a structured representation known as a parse tree.

The output tree, referred to as the "parse tree," consists of a hierarchy of DOM (Document Object Model) element and attribute nodes. The DOM serves as the object representation of the HTML document, providing an interface for HTML elements to interact with external scripts, such as JavaScript. The root of this tree is the "Document" object, and prior to any scripting manipulations, the DOM maintains an almost one-to-one correspondence with the original markup.

The Parsing Algorithm

HTML cannot be parsed effectively using traditional top-down or bottom-up parsers due to several factors:

  • Forgiving Nature of the Language: HTML is designed to be lenient with syntax errors, allowing browsers to display content even when the markup is not perfectly structured.
  • Browser Error Tolerance: Browsers are built to handle common cases of invalid HTML, ensuring that users have a functional experience.
  • Reentrancy of the Parsing Process: In other programming languages, the source remains unchanged during parsing. However, in HTML, dynamic elements (like <script> tags containing document.write() calls) can modify the input during parsing, which necessitates a different approach. Because of these challenges, browsers employ a custom parser tailored for HTML. The parsing algorithm is thoroughly described in the HTML5 specification and consists of two primary stages: tokenization and tree construction.

Actions When Parsing is Finished

Once the parsing is complete, the browser proceeds to fetch external resources linked to the page, such as CSS stylesheets, images, and JavaScript files. At this point, the browser marks the document as interactive and begins parsing scripts that are in "deferred" mode, meaning those scripts are intended to execute after the document has been fully parsed. The document state is then set to "complete," and a "load" event is triggered.

Importantly, browsers do not generate an "Invalid Syntax" error for HTML pages. Instead, they automatically correct any invalid content and continue processing the document, ensuring that users can view web pages with minimal disruption.

CSS Interpretation

The process of CSS interpretation involves several key steps:

  • **Parsing CSS Files: **The browser parses external CSS files, the contents within <style> tags, and the values within style attributes. This parsing follows the "CSS lexical and syntax grammar," which defines the rules and structure of valid CSS.
  • Creating StyleSheet Objects: Each parsed CSS file is transformed into a StyleSheet object. Each StyleSheet object encapsulates the CSS rules, including selectors and the corresponding CSS declarations. This structured representation allows for efficient access and manipulation of styles.
  • Parsing Techniques: The CSS parser can utilize either top-down or bottom-up parsing techniques, depending on the specific parser generator employed. These techniques determine how the parser reads and processes the CSS rules, affecting the efficiency and accuracy of the parsing process. CSS Interpretation

Through this interpretation, the browser builds a comprehensive understanding of how to apply styles to the HTML elements in the DOM, facilitating the rendering of the web page with the intended visual presentation.

Page Rendering

The rendering process of a web page involves several structured steps:

  • Creating the Frame Tree: The rendering engine constructs a 'Frame Tree' or 'Render Tree' by traversing the DOM nodes and calculating the computed CSS styles for each node. This tree represents the visual structure of the page.
  • Calculating Preferred Width: The preferred width for each node in the Frame Tree is calculated in a bottom-up manner. This involves summing the preferred widths of the child nodes along with the node's horizontal margins, borders, and padding.
  • Calculating Actual Width: The actual width of each node is determined in a top-down approach by distributing the available width among its children based on their needs.
  • Calculating Height: The height of each node is calculated bottom-up by applying text wrapping and summing the heights of the child nodes along with the node's margins, borders, and padding.
  • Determining Node Coordinates: The coordinates of each node are computed using the width and height information gathered in the previous steps.
  • Handling Complex Elements: More intricate calculations are performed for elements that are floated, positioned absolutely or relatively, or that employ other complex features. For further details, refer to the CSS specifications at CSS2 and the current CSS work.
  • Creating Layers: Layers are created to describe which parts of the page can be animated together without requiring re-rasterization. Each frame/render object is assigned to a specific layer.
  • Allocating Textures: Textures are allocated for each layer of the page to optimize rendering performance.
  • Executing Drawing Commands: The frame/render objects for each layer are traversed, and drawing commands are executed for their respective layers. This rendering can be handled by the CPU or directly drawn on the GPU using technologies like D2D (Direct2D) or SkiaGL.
  • *Reusing Calculated Values: * The rendering process can leverage calculated values from the previous rendering of the webpage, enabling more efficient incremental changes that require less computational work.
  • Compositing Layers: The final page layers are sent to the compositing process, where they are combined with other visible content, such as the browser chrome, iframes, and addon panels.
  • Finalizing Render Commands: The final layer positions are computed, and composite commands are issued via graphics APIs like Direct3D or OpenGL. The GPU command buffers are flushed to the GPU for asynchronous rendering, and the completed frame is sent to the window server for display. How to render a webpage?

GPU Rendering

  • During the rendering process, graphical computing tasks can utilize either the general-purpose CPU or the specialized graphical processor GPU.
  • When leveraging the GPU for graphical rendering computations, the graphical software layers divide the workload into multiple smaller tasks. This approach allows them to take full advantage of the GPU's massive parallelism, which is particularly effective for the floating-point calculations required in the rendering process.
  • The GPU excels in handling numerous operations simultaneously, making it well-suited for rendering complex visual content efficiently and rapidly. This parallel processing capability significantly enhances performance, especially in applications involving high-resolution graphics, animations, and real-time rendering.
  • As a result, using the GPU not only speeds up the rendering process but also enables more sophisticated visual effects and smoother user experiences in modern web applications and graphics-intensive software.

Benefits of GPU Rendering

This image is also rendered by the GPU

Post-Rendering and User-Induced Execution

After the rendering process is complete, the browser executes JavaScript code triggered by various events, such as timing mechanisms (like a Google Doodle animation) or user interactions (e.g., typing a query into the search box and receiving suggestions).

  • Plugins: Additionally, plugins such as Flash or Java may also execute, although they typically do not run at this point on the Google homepage.
  • Network Requests: JavaScript scripts can initiate further network requests, fetching additional resources or data as needed.
  • DOM Modifications: These scripts have the ability to modify the existing page or its layout, which can lead to another round of page rendering and painting. This dynamic capability allows for interactive experiences, where content can change in real-time based on user actions or other conditions, enhancing the overall functionality and responsiveness of the web application. The interaction between JavaScript execution and the rendering engine is crucial for creating rich, engaging web experiences, allowing developers to build applications that respond intuitively to user input and changing contexts.
💖 đŸ’Ș 🙅 đŸš©
kavya-sahai-god
Kavya Sahai

Posted on October 5, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related