- Excelling With DynamoDB
- Posts
- Why You Should Avoid Filter Expressions In DynamoDB
Why You Should Avoid Filter Expressions In DynamoDB
Filter expressions do not work in the way you think they do, here's what to do instead.
When developers first start working with DynamoDB, they often approach it with a relational database mindset.
Queries and scans feel familiar, and the introduction of filter expressions seems like a helpful tool to narrow down results.
After all, filtering is a common practice in SQL, so why not apply it the same way in DynamoDB?
But here’s the catch: filter expressions in DynamoDB don’t behave like you might expect.
They are not part of the query itself. Instead, they are applied after DynamoDB retrieves the matching items from the specified table or index.
This distinction is subtle but really important to understand.
I’ve seen developers misuse filter expressions and this always leads to significant performance issues, higher costs, and even unexpected throttling.
Let’s dive into why you should avoid relying on filter expressions, and what you should be doing instead.
What Filter Expressions Actually Do
In DynamoDB, a filter expression is evaluated on the items that are already retrieved by a Query or Scan operation.
Let’s take the below example:
const command = new QueryCommand({
TableName: "orders",
KeyConditionExpression: "customerId = :cid",
FilterExpression: "status = :status",
ExpressionAttributeValues: {
":cid": { S: "123" },
":status": { S: "pending" }
}
});
At first, this query looks efficient.
You might assume DynamoDB only retrieves orders that match both the customerId and the status (of the order).
But under the hood, what is happening is DynamoDB fetches all items for that customer and then discards those that don’t match status value of “pending”.
That means the filtering happens in memory, after the database has already done the work of reading the items.
If a customer has 100,000 orders, DynamoDB will pull all of them (up to it’s 1MB limit) and then trim the results. Your application only sees the pending ones, but you still pay for the read capacity and latency of scanning everything.
Why This Becomes a Problem
There are several reasons why using Filter Expressions can be problematic.
Here are the main ones.
1. Wasted Read Capacity Units (Increased Costs)
DynamoDB charges for every item it reads, not just the items returned.
If you query 100,000 items and filter down to 10, you still pay for 100,000 reads.
That can become very expensive very quickly.
2. Increased Latency
More reads mean slower queries. You will see queries taking seconds instead of milliseconds when querying large datasets with filters.
3. Pagination Issues
When using filter expressions and paginating results with “LastEvaluatedKey”, DynamoDB will paginate results based on the full dataset, not on the filtered set.
What this means is some pages of results will often contain less items than others and the pagination flow may be offset by a large gap in a filtered dataset.
What To Do Instead
The solution is to avoid filters as much as possible. DynamoDB is not optimized for post-query filtering, it’s optimized for precisely targeted access patterns.
Essentially this means you should be using different methods for filtering data efficiently.
Here are some of these strategies:
Filter directly by the sort key
Key conditions are part of the query and are evaluated directly by DynamoDB’s. If you can model your partition key and sort key to align with your access patterns, you won’t need filter expressions at all.
For example, instead of querying all orders for a customer and filtering by status, you could store orders with a sort key like <status>#<timestamp>. That way, querying for pending orders is a direct and efficient operation.
Create Global Secondary Indexes (GSIs)
If you need to query by attributes that aren’t part of your primary key, a GSI is usually the best solution.
In the orders example, you could create a GSI with status as the partition key and orderDate as the sort key. Now you can query directly for pending orders without scanning unrelated data.
Use Sparse Indexes
Sparse indexes are GSIs that only project items with certain attributes.
By designing your schema so that only relevant items are written to an index, you avoid storing unnecessary data and keep queries efficient.
For example, a GSI where the partition key is pendingOrder (only set for pending orders) ensures that queries for pending orders are fast and cheap.
When Are Filter Expressions Acceptable?
There are cases where filter expressions are acceptable, especially for small datasets or administrative tools where occasional inefficiency isn’t critical.
If you’re filtering on a handful of items and don’t expect large growth, filters can be convenient.
But for production workloads with potentially large datasets, filter expressions should be avoided.
👋 My name is Uriel Bitton and I hope you learned something in this edition of Excelling With DynamoDB.
📅 If you're looking for help with DynamoDB, let's have a quick chat.
🙌 I hope to see you in next week's edition!