Building a Simple Production Error-Logger w/ Node and S3

colin-williams-dev

colin-williams-dev

Posted on April 14, 2024

Building a Simple Production Error-Logger w/ Node and S3

🤔Problem:

You want to record error logs asynchronously for an app in production, but you don't want to eat up limited disk space. You also want your dev team to easily be able to access the logs as soon as they can. You also don't want to pay for things you don't have to...😋🏴

💡Solution:

Hook into the processes you are already listening for and logging to stderr/stdout but also write them to a file which you can then ship to a hosted service and delete the logs from disk.


There were a few things I wanted my logger to satisfy:

  1. Catch and log errors that have not been handled in the code.
  2. Persist the logs so that the dev team has time to find them (the app should not be crashing or terminating).
  3. Don't take up precious disk-space on the deployment.
  4. Render the data in a human readable format with useful information.
  5. Make the data as available as possible (without being completely public*).

*AWS IAM users pending still..


📌Node.js Process and FS

I was working on my application's execution context architecture, setting up lots of try/catches with re-throws when the above "💡Solution" thought occurred to me. I had just written my top level:

/////////////////////////////////////////////////////////
///// #region top-level node critical-failure catch /////
process.on('uncaughtExceptionMonitor', (err, origin) => {
  console.error(`Critical failure, propagated to top-level from ${origin}, error: `, err);
  /* TODO: create custom monitor class here that will handle application recover or restart from unhandled critical failure... */
});

process.on('unhandledRejection', (reason, promise) => {
  console.log('Unhandled Rejection at:', promise, 'reason:', reason);
  /* TODO: application logging, throwing an error, or other logic here for uncaught promises */
});

process.on('uncaughtException', (err, origin) => {
  const log = `Caught exception: ${err}\n` +
  `Exception origin: ${origin}\n`;
  console.error("Critical Error -- app is about to explode... \n performing synchronous cleanup.. \n writing crash state to terminal..");
  fs.writeSync(
    process.stderr.fd,
    log
  );
  /* TODO: needs to be pruned on CRON job, otherwise infinitely expands for logs */
  if (process.env.NODE_ENV === "production") {
    fs.appendFileSync(
      "error.log",
      `Logging to ./error.log app critical failure:\n
      ${new Date().toISOString()}\n
      ${log}\n --- \n
      \n`
    );
  }
  console.error("goodbye.");
});
/////////////////////////////////////////////////////////
// #endregion ///////////////////////////////////////////
Enter fullscreen mode Exit fullscreen mode

With this implementation I was writing to a file in the root called "error.log" (creating--if not exists--to the local disk).

With a node backend, I could log what I needed (on node's processes events) in the typical stderr/stdout but I also wanted to make it persist. I could have just stuffed it into a database somewhere... but, I didn't want to pollute the database and I also wanted the logs to be more readily available. I knew I could use node's fs module to write to a file but I was concerned about eating my production instance's disk space. 😓

I wanted to leverage more of what I already had available to me as well as limit my costs as much as possible. I thought about where else I could put these logs with low overhead while provisioning as few resources as possible. I started with technologies I already had implemented for my project:

  • GitHub (where the development git repo lives)
    • Pages (free)❌
  • AWS (the EC2 where the production deployment will eventually live)
    • S3 (free)✅

📌AWS S3 HTML/JS

I knew I could quickly provision an index.html and probably throw in a <script> where I could leverage the JS Fetch API to send an HTTP request to my node server. I could also easily use the dom with vanilla JS and HTML to display my logs in a nice, human-readable format:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>GH-SP-Logger</title>
</head>
<body>
  <h1>GH-Scrum-Poker-Error-Logger:</h1>
  <ol id="error-logs-list"></ol>

  <script>
    function fetchErrorLogs() {
      fetch("https://tricky-monkeys-sin.loca.lt/error-logger", { // TODO: replace with static origin-url of hosted node web-server
        method: "GET",
        mode: "cors",
        headers: {
          "Accept": "text/plain",
          "bypass-tunnel-reminder": "true" // Set the bypass-tunnel-reminder header TODO: remove this
        }
      })
        .then(response => response.text())
        .then(data => {
          const logs = data.split("---");
          logs.forEach(log => {
            console.log(`Next Log Entry -- ${log}`);
            const errorLogsElement = document.getElementById("error-logs-list");
            const li = document.createElement("li");
            const pre = document.createElement("pre");
            const space = document.createElement("p");
            pre.textContent = log;
            space.textContent = "---";
            li.appendChild(pre);
            li.appendChild(space);
            errorLogsElement.appendChild(li);
          });
        })
        .catch(err => console.error(err));
    };

    fetchErrorLogs();

    setInterval(fetchErrorLogs, 720 * 60 * 1000); // fires every 12 hours TODO: set this to something?

    /* Set interval to purge <li> items from <ol> every 2 weeks */
    setInterval(() => {
      const errorLogsElement = document.getElementById("error-logs-list");
      errorLogsElement.innerHTML = ""; // Clear all <li> items
    }, 14 * 24 * 60 * 60 * 1000);
  </script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

I utilized JS native setInterval to constrain how frequently the requests were being sent (once every 12 hours) and purge stale error logs (once every two weeks). I opted for a long time between purges since the ordered list HTML element will keep the sequence of logs relevant to their time of occurrence (and I also have an ISO string Date prepended to each item). The free hosting space for a static site with S3 should suffice and this way I don't need to leverage any object delete operations which could incur extra costs.

This seemed all fine and dandy... but while setting up the static site on S3 I noticed something... CORS... (ughhhh)

I threw up a wild card policy (S3) in json for testing/development:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "HEAD"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [],
        "MaxAgeSeconds": 3000
    }
]
Enter fullscreen mode Exit fullscreen mode

As soon as the webserver is deployed (dev and test on localhost for now) the allowed origins will be swapped out.

📌Tunneling/Proxy:

During this process it occurred to me that the deployed S3 would not be able to communicate with my locally running node server... I had used tunneling before with ngrok but I wanted something lighter and simpler... enter localtunnel. With a simple CLI command: lt --port 3000 I could tunnel my locally-running-node-process port a randomly generated, live HTTPS URL.

You may have already noticed this URL in the S3 fetch call: fetch("https://tricky-monkeys-sin.loca.lt/error-logger" ...). This, as well as all wildcard origin-access will all be swapped out for the safeguarded origin of the production server.

📌Node Endpoint and Middleware

Next, I had to set up an endpoint that would receive the request from my S3's JS Fetch request. I knew I could read the data (the logs) from the file on disk and send them somewhere over HTTP with node by creating an endpoint and simple middleware.

First, I designed the middleware function:

const pathToFile = path.resolve(__dirname, "..", "..", "error.log");

/* TODO: This reads from local error.log on disk... needs to be pruned incrementally? */
const errorLogger = async (_: Request, res: Response): Promise<void> => {
  try {
    console.log(`Reading from Error Log file on Disk.. -- ${pathToFile}`);
    fs.readFile(pathToFile, "utf-8", (err, data) => {
      if (err) {
        console.error(`404 root error.log could not be read.. ${err}`)
        res.sendStatus(404);
      }
      res.setHeader("Access-Control-Allow-Origin", "*"); /* TODO: SWAP OUT THIS "*" WITH ACTUAL ORIGIN (replace lt proxy) */
      res.send(data); /* TODO: swap out origin in S3 permissions: CORS when static origin created (when above swap happens) (also noted in src/index) */
    });
  } catch (error) {
    console.error(`500 Failure reading error.log from disk.. ${error}`);
    res.sendStatus(500);
  }
};
Enter fullscreen mode Exit fullscreen mode

and hooked it up to an endpoint:

export const configureServer = (server: Application) => {
    /* Global Middleware */
    server
        .use(middleware)
        .use(cors())
        .use(json())
        .use(urlencoded({ extended: true }));

    server.get("/error-logger", errorLogger);
// ...
Enter fullscreen mode Exit fullscreen mode

📌CRON

The last step was to ensure that disk space was not being eaten up by the node FS write operations in the "production" environment. I elected to purge the file on disk, once a day, since it would be sending all log data every 12 hours to the S3 bucket. To achieve this I implemented a simple CRON Job:

// Path to the local error log file defined earlier
const pathToFile = path.resolve(__dirname, "..", "..", "error.log");

// Define a cron job that runs every day at midnight
const job = new cron.CronJob("0 0 0 * * *", () => {

  // Truncate the error log file to empty it
  fs.truncate(pathToFile, 0, (err) => {
    if (err) {
      console.error("Error truncating error.log file:", err);
    } else {
      console.log("Disk error.log file has been cleaned out.");
    }
  });
});

// Start the cron job
job.start();
Enter fullscreen mode Exit fullscreen mode

Hopefully you found this interesting and/or informative! Leave a comment if you have any suggestions 😋

P.S.

Depending on the success of this application I may migrate to a paid service such as Sentry.io for a much more elegant error monitoring process. (But, I think this workflow has a pretty decent DevX (since the team only needs to visit a URL to see a styled HTML render))

💖 💪 🙅 🚩
colin-williams-dev
colin-williams-dev

Posted on April 14, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related