Wednesday, March 18, 2009

Simple Savant: .NET Object-Persistance Framework for Amazon SimpleDB

I'm building an application that stores all structured data using Amazon's SimpleDB service. When I started creating the overall architecture I searched for recommendations on designing applications specifically for SimpleDB or similar services. I didn't find many tips, but I did find lots of complaints about the disadvantages of SimpleDB when compared to mature RDBS products. I also discovered the available .NET interfaces to SimpleDB were fairly low-level and didn't put much effort into overcoming these inherent deficiencies.

So I put together a list of the higher-level features that would simplify building an application with SimpleDB and built many of these features on top of the Amazon C# Library for SimpleDB. The result is the Simple Savant .NET library (written in C#) which I've open-sourced at CodePlex.

So what are the biggest hurdles when designing for SimpleDB vs an RDBMS? I came up with the following list:

  • No transactions.
  • No numeric or date/time types. All attributes are stored as text strings so special formatting is required to support sorting and searching.
  • Arbitrary truncation of query results.
  • Eventual consistency. Item modifications are not guaranteed to be immediately visible on successive requests.
  • No full-text searching.
  • Item attributes are limited to 1024 characters.

(You could throw in additional deficiencies on the administration/reporting side, but my focus here is on application design.)

The first release of Simple Savant addresses several of these fundamental issues and dramatically reduces the level of effort required to work with SimpleDB. Features include:

  • Mapping object properties to SimpleDB attributes.
  • Formatting of basic .NET data types to support lexicographical sorts and searches.
  • Support for ADO.NET-style parameterized select operations (including formatting and escaping of parameter values)
  • Unlimited select results in a single call.
  • Transparent caching on Get and Put operations to mitigate the effects of SimpleDB's eventual consistency model.
  • Automatic domain creation

Using Simple Savant

Let's start by designing a class to hold information about a person:

   1: using System;
   2:  
   3: namespace Coditate.Savant.ConsoleSample
   4: {
   5:     [DomainName("Person")]
   6:     public class PersonItem
   7:     {
   8:         [ItemName]
   9:         public Guid Id { get; set; }
  10:  
  11:         public string FirstName { get; set; }
  12:  
  13:         public string LastName { get; set; }
  14:  
  15:         public string EmailAddress { get; set; }
  16:  
  17:         public DateTime BirthDate { get; set; }
  18:  
  19:         public float Height { get; set; }
  20:  
  21:         public float Weight { get; set; }
  22:  
  23:         public WeightUnit WeightUnit { get; set; }
  24:  
  25:         public HeightUnit HeightUnit { get; set; }
  26:  
  27:         public override string ToString()
  28:         {
  29:             return string.Format("\n\r\tName: \t\t{0}, {1}\n\r\tEmailAddress: \t{2}\n\r\tId: \t\t{3}", LastName,
  30:                                  FirstName,
  31:                                  EmailAddress, Id);
  32:         }
  33:     }
  34:  
  35:     public enum WeightUnit
  36:     {
  37:         Pounds,
  38:         Kilograms
  39:     }
  40:  
  41:     public enum HeightUnit
  42:     {
  43:         Inches,
  44:         Meters
  45:     }
  46: }

This is just about the simplest possible class we could use with Simple Savant. The ItemName attribute attached to PersonItem.Id is the only customization required for storing person instances in SimpleDB. It tells Simple Savant which property to use as the SimpleDB item name for Get, Put, and Delete operations.

The class-level DomainName attribute is optional. It lets us customize the SimpleDB domain where instances are stored. By default the class name is used for the domain name. Thus if we removed the DomainName attribute person instances would be stored in the "PersonItem" domain, but with the attribute they will be stored in the "Person" domain.

Next let's populate a PersonItem and put it in SimpleDB:

   1: PersonItem person = new PersonItem
   2:     {
   3:         BirthDate = new DateTime(1972, 1, 15),
   4:         EmailAddress = "bob@example.com",
   5:         FirstName = "Bob",
   6:         Height = 72.5f,
   7:         HeightUnit = HeightUnit.Inches,
   8:         Id = Guid.NewGuid(),
   9:         LastName = "Smith",
  10:         Weight = 200,
  11:         WeightUnit = WeightUnit.Pounds
  12:     };
  13:  
  14: string awsAccessKeyId = "xxxxxxx";
  15: string awsSecretAccessKey = "xxxxxxx";
  16: SimpleSavant savant = new SimpleSavant(awsAccessKeyId, awsSecretAccessKey);
  17:  
  18: savant.Put(person);

Once we've populated our person object it takes just a few lines of code to configure Simple Savant and send our person to SimpleDB!

Here's what we'd find in SimpleDB after executing this code:

ItemName

EmailAddress

Height

Weight

HeightUnit

WeightUnit

BirthDate

FirstName

LastName

c9748367-c929-408d-b7cf-60b3677717cf

bob@example.com

1072.5

1200

Inches

Pounds

1972-01-15T00:00:00.000-05:00

Bob

Smith

By default property names are used for SimpleDB attribute names, but you can customize them using the AttributeName attribute. Other points of note:

  • Dates are formatted using the ISO 8601 standard to support lexicographical ordering.
  • Unsigned numeric types are zero-padded to support lexicographical ordering.
  • Signed numeric types are offset to support lexicographical ordering when positive and negative values are mixed together. The default offset value is 10 raised to the nth power, where n is the maximum number of whole digits supported by the type. This is why our height value of 72.5 is stored as 1072.5 and our weight value of 200 is stored as 1200. (Float or Single precision values are formatted with three whole and four decimal digits by default.) 
  • All properties are mapped to SimpleDB attributes by default. You can customize this behavior using the SavantInclude and SavantExclude attributes.
  • Property formatting can be customized using the CustomFormat and NumberFormat attributes. For example, we could easily customize  PersonItem.Weight to be formatted with five whole digits and two decimal digits to increase the maximum possible weight value from 999 to 99,999.

Getting our person back from SimpleDB takes just a couple more lines of code:

   1: Guid personId = person.Id;
   2: PersonItem person2 = savant.Get<PersonItem>(personId);

If we added many people to our person domain and needed to select them all we could do so like this:

   1: IList<PersonItem> allPeople = savant.Select<PersonItem>("select * from Person");

Note that this query would return all items in the domain. Simple Savant requests all available pages of results from SimpleDB unless you explicitly limit results using properties on SelectCommand. See the API documentation for more details on this.

Finally, to run a more sophisticated range query finding all people born during the 1980s we can use a parameterized command:

   1: SelectCommand<PersonItem> command = new SelectCommand<PersonItem>("select * from Person where BirthDate between @StartDate and @EndDate");
   2: command.AddParameter(new CommandParameter("StartDate", "BirthDate", new DateTime(1980, 1, 1)));
   3: command.AddParameter(new CommandParameter("EndDate", "BirthDate", new DateTime(1989, 12, 31)));
   4:  
   5: SelectResults<PersonItem> eightiesPeople = savant.Select(command);

The same formatting rules are applied to select parameters as when performing Get and Put operations on the Person domain. In other words, if we defined custom formatting behavior for PersonItem.BirthDate, the same formatting rules would be used for select operations involving the BirthDate attribute--and the query above would still just work.

Hopefully this post will help you get started using Simple Savant. The CodePlex release includes sample code and full API documentation that should keep you going.

UPDATE: Here is a brief tutorial on the new features in Simple Savant v0.2.

11 comments:

Anonymous said...

This looks really neat! I am going to take a look at using this in my projects.

AT said...

Thanks! I'll be releasing a new version with significant enhancements in a week or two so check back soon.

Hugo said...

I find this library really useful. Thanks for your efforts.

AT said...

@Hugo: Thanks for the feedback and kind words!

Nick said...

This looks like a fantastic project. I've twittered about it because I think more people should know!

Many thanks, I'll be using this in my next project for sure. I'll let you know how it goes.

Thanks,
Nick

AT said...

@Nick: Thanks much. Things have been quiet lately, but more enhancements will come soon. I'm now under NDA with Amazon and so can keep SS releases in synch with new SimpleDB features as necessary.

Anonymous said...

Nick,

I've done something similar for www.hearmytweet.com rolling my own. I did because I had to and I didn't see your solution. Of course, I did not make it as elegant with your meta declarations. Since you have already gone to the trouble of making this, could you go a step further and put in LINQ compatibility? That would really be something and I would use it.

AT said...

@Anon: At this point it is unlikely that I will add Linq support to Simple Savant--it is not on the proposed feature list right now.

You might be interested in checking out this project:
http://linqtosimpledb.codeplex.com/

I'm not sure how mature it is or if it's being actively developed, but it does Linq to SimpleDB.

timb said...

This is rocking! Good work, thanks for sharing.

AT said...

@timb: Thanks!

Webyacusa said...
This comment has been removed by the author.
 
Header photo courtesy of: http://www.flickr.com/photos/tmartin/ / CC BY-NC 2.0