Mastering Request Interceptions in Puppeteer

saairaam

Saairaam Prasad

Posted on March 4, 2024

Mastering Request Interceptions in Puppeteer

Puppeteer, a Node library developed by Google, offers a powerful API for controlling headless or full browsers via the DevTools Protocol. One standout feature of Puppeteer is its capability to intercept and manipulate network requests, empowering developers to customize requests, modify responses, and manage data flow during web scraping or automation tasks.

Understanding Request Interception

Request interception in Puppeteer enables developers to observe, modify, or block outgoing HTTP requests and incoming responses. This feature proves invaluable when optimizing page loading, simulating various network conditions, or managing dynamic content loading.

Enabling Request Interception

To activate request interception in Puppeteer, you follow these steps:

  1. Activate request interception on the page using page.setRequestInterception(true).
  2. Capture all requests made on the site, emitting an event for each network request.
  3. Capture all API responses on the site via page.on('response').
await page.setRequestInterception(true);

page.on('request', (request) => {
  // Your custom logic here
  request.continue();
});

page.on('response', (response) => {
  // Your response handling logic here
});
Enter fullscreen mode Exit fullscreen mode

Modifying Requests

Request interception facilitates modification of outgoing requests' properties, such as setting custom headers, altering request methods, or adjusting the request payload.

page.on('request', (request) => {
  const headers = request.headers();
  headers['Authorization'] = 'Bearer YOUR_TOKEN';
  request.continue({ headers });
});
Enter fullscreen mode Exit fullscreen mode

Blocking Requests

Another powerful aspect of request interception is the ability to block specific requests based on certain conditions.

page.on('request', (request) => {
  if (request.url().includes('blocked-resource')) {
    request.abort();
  } else {
    request.continue();
  }
});
Enter fullscreen mode Exit fullscreen mode

Real-world Examples

Let's explore practical use cases for request interception in Puppeteer:

  1. Dynamic Content Loading
page.on('request', async (request) => {
  if (request.url().includes('dynamic-content')) {
    await request.continue();
    await page.waitForSelector('.loaded-element');
  } else {
    request.continue();
  }
});
Enter fullscreen mode Exit fullscreen mode
  1. API Mocking
page.on('request', (request) => {
  if (request.url().includes('mock-api')) {
    request.respond({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({ mockData: true }),
    });
  } else {
    request.continue();
  }
});
Enter fullscreen mode Exit fullscreen mode

Note: Keep in mind that Puppeteer's page.on("request") only captures requests made using the page object. XHR and fetch requests made within the page's context are captured, but requests initiated outside the context of the page might not be intercepted.

Practical implementations for the alternative ways
Now let's start the implementation of request interception on the IRCTC website.

const puppeteer = require("puppeteer-extra");
const pluginStealth = require("puppeteer-extra-plugin-stealth")();
puppeteer.use(pluginStealth);
const scrape = async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
  });
  const page = await browser.newPage();

  await page.goto("https://www.irctc.co.in/nget/train-search", {
    waitUntil: "networkidle0",
  });

  await page.type("#destination > span > input", "MAS");
  await page.keyboard.press("ArrowDown");
  await page.keyboard.press("Enter");
  await page.type("#origin > span > input", "KRR");
  await page.keyboard.press("ArrowDown");
  let headers;
  page.on("response", async (response) => {
    if (
      response
        .request()
        .url()
        .includes(
          "https://www.irctc.co.in/eticketing/protected/mapps1/altAvlEnq/TC"
        )
    ) {
      headers = response.request().headers();
      const apiRes = await fetch(
        "https://www.irctc.co.in/eticketing/protected/mapps1/altAvlEnq/TC",
        {
          headers,
          body: '{"concessionBooking":false,"srcStn":"MAS","destStn":"MMCT","jrnyClass":"","jrnyDate":"20240225","quotaCode":"GN","currentBooking":"false","flexiFlag":false,"handicapFlag":false,"ticketType":"E","loyaltyRedemptionBooking":false,"ftBooking":false}',
          method: "POST",
          credentials: "omit",
        }
      );
      console.log(await apiRes.json());
    }
  });
  await page.keyboard.press("Enter");
  await page.click("[label='Find Trains']");
};
scrape();
Enter fullscreen mode Exit fullscreen mode

In the above code, we would have accessed the response emitter and then entered the destination station as KRR. However, in the API fetch call body, we are using the destination station as MMCT. Thus, we get the response as per the body, and we can access the data accordingly.
Note: the above code doesn't work at times as the IRCTC asks for login sometimes, in such cases, wait for some time and try again after sometime

Conclusion

Delving into Puppeteer's request interception unlocks a realm of possibilities for web automation and testing. With the ability to tweak headers, intercept and block requests, or simulate diverse network conditions, you have the tools to orchestrate a symphony of digital interactions.

So, dive in, explore, and let your creativity soar. Whether you're a seasoned developer or new to web automation, Puppeteer's request interception offers endless opportunities for innovation. Happy coding!

💖 💪 🙅 🚩
saairaam
Saairaam Prasad

Posted on March 4, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related