DynamoDB and Projection Expressions – Why?

tl;dr

What it does do:

Returns ONLY the data you requested.
Saves you network traffic between your app and Dynamo.

What it doesn’t do:

Save you read units.

The details:

Problem:

Recently while modifying one of our API endpoints, I realized I needed to make another read from the DB. One of the things we are struggling with is the read/write unit restrictions and the pay model that DynamoDB enforces as part of their pricing model. We use the DB as somewhat of a document store and reads/writes can cause some pretty significant spikes which can cause throttling on the DB, which is not great..

I decided that rather than use an existing read of the DB row that gets all of the data, I’d specify the columns I needed with Projection Expressions and only have the data I needed returned. In theory with would save a significant amount of read units. Great!

In dynamo 1 read unit is 4KB, rounded up. So a read that gets 58B out of the DB will still consume an entire read unit.
Currently, a row on this table looks a little something like this:

units

Intended Solution:

My intention to only read out those 3 columns was going to save me about 98% of my read units. Which, with only a little more effort and being a little more specific, was fairly easy to implement. Instead of using a document to make a call on the DB, I use a GetItemRequest with projection expressions set to the columns I needed. Pretty straightforward, no Local Secondary indexes or anything like that need to be set up.

FormCode

This was wrong. I mentioned to one of my co-workers: “Check this out, I don’t need a secondary index but I can still only request what I need. Look at all the read units we’ll save!” He insisted that the entire row worth of read units was still consumed, but when we both looked for proof as to what it actually consumed, we both came up pretty dry.

What actually happens:

So I looked into it myself, and figure out what the SDK is doing. Pulling down the code from the SDK and checking out the .net45 project You can see that the SDK makes a HTTP post to the dynamo API, passing in the ProjectionExpression through the body of the request.
The SDK also has an unmarshalling service that returns the response back through the SDK. There is no removal of the data here, so the ProjectionExpressions must be getting honored within the DynamoDB API itself.

This did show one benefit to ProjectionExpressions, bandwidth *is* being saved. Unfortunately the DynamoDB API is a bit of a black box, so it was a bit hard to tell what was going on inside.

We did have a theory however.
Once the request has been received by the DynamoDB API, the first available shard is found (in this case it was not a consistent read), the entire row is read and then the relevant data is extracted and returned.

better api chart

We weren’t sure if this is exactly what happened. Rather than set up a couple of examples with either model and measure the read unit metrics, we contacted our solutions architect to get an answer with the above solution as the example.
The response we got was:

The full item size will be consumed from the provisioned capacity, regardless of the projected expression. Projected expressions do optimize network traffic as well as latency due to parsing (as it happens DynamoDB side). But do not decrease the consumed capacity.AWS Architect

Confirming our theory.

The proposed solution to get the behavior that we were originally expected was to set up a local secondary index that does not consume the ‘Document’ column.

What we ended up doing:

By this point finding out what actually happened was more of a curiosity than anything. We decided that while it would have been nice to save the read units, the read units themselves were not as much of an issue as write units have become. So we ended up using a call that does request and return the entire row and keeping in mind that we do have an alternative method if we need it.

Leave a Reply

Your email address will not be published. Required fields are marked *