Designing A Scalable Voting System With DynamoDB

Learn DynamoDB efficient table design by designing a voting system.

Imagine the following scenario.

You are tasked with creating a (DynamoDB) database table to store votes for the upcoming presidential election.

You typically have several candidates but only two major ones.

These 2 naturally get the majority of votes. And on vote day, your two candidates can easily receive 100K votes per second (if not more), each.

We can use DynamoDB as we know it’s an ultra-scalable database.

But as we know, a DynamoDB partition/item can have a maximum of 1K writes per second.

We can’t afford to throttle 99K voters.

A Naive Solution

Here is the naive design solution. We’ll later present a much more efficient design to solve this problem.

drawn with excalidraw

In this situation, we have inherently low cardinality partition keys (only 2 high traffic competing candidates) and very high throughput (more than 100K writes per second).

This table design will throttle the majority of users attempting to vote. This is obviously not ideal.

How can we design this better to solve this problem?

An Efficient Solution

Oftentimes, real-world problems hold the solutions to system design problems.

In real-life voting systems, you never have a single voting station in the country for one candidate. You have several in each state and also more voting stations in each city and neighborhood.

Instead of every American lining up to vote for a candidate in one office in the country, there are multiple offices in multiple cities/states in the country.

When the voting ends, the results are aggregated and summed up to get the total count for each candidate.

We can mirror this solution in our database design.

Let’s split the partition key “candidate” into multiple keys that each represent every city within each state.

For example, we can have:
- “candidate A#NY”
- “candidate A#LA”
- “candidate A#TX”
…etc

Or more efficiently and fine-grained:

- “candidate A#NY#queens”
- “candidate A#LA#hollywood”
- “candidate A#TX#kingwood”

Here’s what it would look like on our database.

This is a much more efficient design for a few reasons:

  1. It mimics the natural way of voting for a candidate

  2. It offers more fine-grained stats on votes per city/region

  3. It allows for a much more scalable design — we can write & read in parallel at larger scale.

With this design we can localize write requests and route them to the appropriate partition — candidate name # city/town.

Now to get the total vote count we can use BatchGetItems of each partition key (candidate#city) to get the total vote count.

Aggregating the partitions voteCount attribute will get us total votes for a particular candidate.

This partition key design gives us two benefits:

  1. We can write at scale and in parallel

  2. We can read at scale

Let’s see how we can further optimize the reads with this design.

Scalable Reads of Votes

Writing data is fairly straightforward — we localize the users regions and write the items to the corresponding partition keys.

For example, if a voter is in Los Angeles, California, and another is in Houston, Texas, their vote would write the following item to our table.

However, reading this data is not as straightforward, especially at scale. Let’s see how we can redesign this further to solve this issue.

We’ll add the following to our primary keys:

  • The partition key: will stay as it is, but will not contain more specific regions than the city (as we see above)

  • The sort key will contain the zip code of the area a voter resides in, prefixed by “zip#”.

Here is the final redesign:

With this table design we can now enable a few access patterns.

We can:

  • Get all of a candidate’s votes in a city.

  • Get candidate votes within a range of zip codes.

  • We can suffix each item’s sort key with the user’s ID (e.g. ZIP#90001#user-201) to get a voters data.

Conclusion

We have managed to efficiently design a large scale voting database table to accommodate a voting system.

By carefully thinking about scalability and efficiency, and by modeling real-life voting systems, we were able to refactor a naive design into a highly efficient and scalable one.

This redesign accommodates not only voters being able to write to thousands of parallel partitions but also to efficiently read from the data for statistical analysis of the data.

👋 My name is Uriel Bitton and I hope you learned something in this edition of Excelling With DynamoDB.

📅 If you're looking for help with DynamoDB, let's have a quick chat.

🙌 I hope to see you in next week's edition!