How to use AWS Data Pipeline to copy between Dynamo DB tables

Today I have an interesting task that one of our Dynamo DB tables need to add a sort key (used to be called range key). Unlike secondary index, obviously I cannot alter the table to add a sort key, because it will change the partition and Dynamo DB will not allow such operation. It seems that the easiest approach is

  1. Create a new table B with primary key and sort key, copy the data from table A to table B;
  2. Drop table A, and recreate table A with the same schema as table B, then copy the data from table B to table A.

The downside is that there will be an outage, but we are okay with that if we can minimize the duration to be just a couple of hours.

Another challenge is that the table has more than 250MM rows. So step 2 needs to be distributed.

I have used Data Pipeline before to back-up data from Dynamo DB to S3 on a weekly schedule, so Data Pipeline seems to be a feasible choice for this job (see the last paragraph for its limitations). Unfortunately there is no template I can re-use, so here is the configuration I set up:

Screen Shot 2016-06-14 at 8.58.03 PM

CopyActivity (HiveCopyActivity) has input: DynamoDBFrom (data format: DynamoDBExportDataFormat), and output: DynamoDBTo. It runs on EMRResource (EmrCluster) with the following parameters:

Master Instance Type: m3.2xlarge
Core Instance Type: m3.2xlarge
Core Instance Count: 10
Ami Version: 3.11.0
Terminate After: 24 hours

That’s it, save and no errors! Activate the job!

Edit: Apparently this solution will not scale either. When I compare the table size between from and to, there is a significant difference. I tested with a small table of 1MM records and it worked fine. My guess is that how distributed scan operation works on Dynamo DB is not consistent.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s