Thoughts on Software Development - Partitioning Azure Table Storage

Partitioning Azure Table Storage

Determining how to divide your Azure table storage into multiple partitions is based on how your data is accessed. Here is an example of how to partition data assuming that reads predominate over writes.

Consider an application that sells tickets to various events. Typical questions and the attributes accessed for the queries are:

How many tickets are left for an event?	date, location, event
What events occur on which date?	date, artist, location
When is a particular artist coming to town?	artist, location
When can I get a ticket for a type of event?	genre
Which artists are coming to town?	artist, location

The queries are listed in frequency order. The most common query is about how many tickets are available for an event.

The most common combination of attributes is artist or date for a given location. The most common query uses event, date, and location.

With Azure tables you only have two keys: partition and row. The fastest query is always the one based on the partition key.

This leads us to the suggestion that the partition key should be location since it is involved with all but one of the queries. The row key should be date concatenated with event. This gives a quick result for the most common query. The remaining queries require table scans. All but one are helped by the partitioning scheme. In reality, that query is probably location based as well.

The added bonus of this arrangement is that it allows for geographic distribution to data centers closest to the customers.