A Mental Model for DynamoDB Write Limits

rogerchi

Roger Chi

Posted on March 4, 2024

A Mental Model for DynamoDB Write Limits

Amazon's DynamoDB is promoted as a zero-maintenance, virtually unlimited throughput and scale* NoSQL database. Because of its' very low administrative overhead and serverless billing model, it has long been my data store of choice when developing cloud applications and services.

When working with any tool, it is important to understand the limits involved to be able to create a mental model that can be used to most effectively plan and implement the usage of that tool. It might seem easy to think that because DynamoDB bills itself as "unlimited throughput and scale" you do not need to think about limitations on write and read throughput. However, as we will explore in this post, there are actually very well defined limits that we need to take into account when implementing solutions with DynamoDB.

Partition limits

Amazon documents these partition level limits: 3,000 RCU/s and 1,000 WCU/s. Let's verify these limits and put them to the test. We want to understand what these limits are for an UpdateItem command and if the limits are any different for direct PutItem commands.

Test setup

CDK code can be found here: https://github.com/rogerchi/ddb-writes-model

We create Items of a predefined size by filling them with random binary data. We then set up lambda functions which will spam our DynamoDB Item with either 10,000 atomic updates, or 10,000 full puts. We set up a stream handler to capture the times of the first and last updates to get an idea of how much time it takes to complete these 10,000 operations.

UpdateItem code

We first initialize an Item with a count of 0 and then run this lambda function against it.

async function updateItem(id: string) {
  const params = {
    TableName: TABLE_NAME,
    Key: {
      pk: { S: id },
      sk: { S: 'ITEM' },
    },
    UpdateExpression: 'set #count = #count + :one',
    ExpressionAttributeNames: {
      '#count': 'count',
    },
    ExpressionAttributeValues: {
      ':one': { N: '1' },
    },
  };

  let attempt = 0;
  while (true) {
    try {
      const command = new UpdateItemCommand(params);
      await client.send(command);
      return;
    } catch (error) {
      attempt++;
      console.error(`Update failed for item ${id}. Attempt ${attempt}.`);
    }
  }
}

interface Event {
  id: string;
}

exports.handler = async (event: Event) => {
  const id = event.id;
  if (!id) {
    throw new Error('Event does not contain an ID.');
  }

  const totalUpdates = 10000;
  const commands = [];

  for (let i = 0; i < totalUpdates; i++) {
    commands.push(updateItem(id));
  }

  await Promise.all(commands);

  console.log('All updates complete.');
};
Enter fullscreen mode Exit fullscreen mode

UpdateItem results (10,000 updates)

Item Size (bytes) Total time (seconds) Op/s WCU/s
1000 10 1000 1000
2000 20 500 1000
4000 40 250 1000

As we can see, the results match the expected write throughput described in the documentation. Does the throughput change if we are using the PutItem instead of the UpdateItem command, since the underlying service doesn't need to retrieve an existing item to update it?

PutItem code

async function putItem(id: string, itemBody: any, count: number) {
  const updatedItem = { ...itemBody, count: { N: `${count}` } };
  const params = {
    TableName: TABLE_NAME,
    Item: updatedItem,
  };

  let attempt = 0;
  while (true) {
    try {
      const command = new PutItemCommand(params);
      await client.send(command);
      return;
    } catch (error) {
      attempt++;
      console.error(`Put failed for item ${id}. Attempt ${attempt}.`, error);
    }
  }
}

interface Event {
  id: string;
  size: number;
}

exports.handler = async (event: Event) => {
  const { id, size } = event;

  const data = generateData(id, size);
  const item = marshall({
    pk: id,
    sk: 'PUTS',
    size,
    data: Uint8Array.from(data),
  });

  const totalUpdates = 10000;
  const commands = [];

  for (let i = 0; i < totalUpdates; i++) {
    commands.push(putItem(id, item, i));
  }

  await Promise.all(commands);

  console.log('All updates complete.');
};
Enter fullscreen mode Exit fullscreen mode

PutItem results

Item Size (bytes) Total time (seconds) Op/s WCU/s
1000 10 1000 1000
2000 20 500 1000
4000 40 250 1000

Our PutItem results are identical to our UpdateItem results, and conform with the idea that the maximum write throughput for an Item is 1,000 WCU/s.

TransactWriteItem

TransactWriteItem costs two WCU per 1KB to write the Item to the table. One WCU to prepare the transaction, and one to commit the transaction. You might assume that for a 1KB Item, given what we learned above, we will be able to perform 500 transactions per second. Does that prove out?

Code

async function transactUpdateItem(id: string) {
  const params = {
    TableName: TABLE_NAME,
    Key: {
      pk: { S: id },
      sk: { S: 'TRXS' },
    },
    UpdateExpression: 'set #count = #count + :one',
    ExpressionAttributeNames: {
      '#count': 'count',
    },
    ExpressionAttributeValues: {
      ':one': { N: '1' },
    },
  };

  let attempt = 0;
  while (true) {
    try {
      const command = new TransactWriteItemsCommand({
        TransactItems: [{ Update: { ...params } }],
      });
      await client.send(command);
      return;
    } catch (error) {
      attempt++;
      console.error(`Update failed for item ${id}. Attempt ${attempt}.`);
    }
  }
}

interface Event {
  id: string;
}

exports.handler = async (event: Event) => {
  const id = event.id;
  if (!id) {
    throw new Error('Event does not contain an ID.');
  }

  const totalUpdates = 10000;
  const commands = [];

  for (let i = 0; i < totalUpdates; i++) {
    commands.push(transactUpdateItem(id));
  }

  await Promise.all(commands);

  console.log('All updates complete.');
};
Enter fullscreen mode Exit fullscreen mode

TransactWriteItem results

Item Size (bytes) Total time (seconds) Op/s WCU/s
1000 60 167 333
2000 90 111 444
4000 130 77 615
8000 188 53 851
16000 370 27 864

We see from the TransactWriteItem results that we cannot achieve the full 1000 WCU/s through transactions, and that it seems like there is some fundamental overhead so that smaller items fall far from the 1000 WCU/s throughput and it's not until the item sizes get beyond 8KB that we get closer (but not quite reaching) the 1000 WCU/s partition throughput.

Our mental model for DynamoDB Writes

So what's our foundational mental model for DynamoDB write limits? You need to understand that writes target a specific Item at all times. This Item can have up to 1000 WCUs of Put or Update actions performed on it. This means an item of up to 1KB of size can be written or updated 1,000 times a second. An item of 2KB of size can be written or updated 500 times a second, and an item of 200KB of size can only be written or updated 5 times a second. To be able to exceed these limits, you must shard your writes. But because of instant adaptive capacity, you can think of each Item as being its own separate partition, literally a whole set of infrastructure dedicated to that single Item, which is pretty darn amazing. It's why there's no other service that shards quite like DynamoDB, and why it is the ultimate multi-tenant service, a real example of the power of scale in the cloud.

The caveat is that write throughput is significantly decreased with introducing transactions, from around 167 operations per second for a 1KB item to 27 operations per second for a 16KB item.

About me

I am a Staff Engineer @ Veho with a passion for serverless.

💖 💪 🙅 🚩
rogerchi
Roger Chi

Posted on March 4, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related