Design Patterns for data persistence: Unit-of-Work Pattern and Repository Pattern

4/18/2012 code

Design Patterns

Design Patterns are names we give code constructs. We’re not usually inventing anything new, we’re just giving a name for things we’ve always done. You’ve probably been doing it this way forever, but you never knew what to call it, it was just “the way it’s done”. Giving it a name is very powerful as it gives us a way to communicate about larger topics. Yeah, we can talk about for-loop, if-block, but what about system architecture topics? In the world of auto repair, they have names for their larger pieces: radiator, engine block, alternator. How do you convey to someone how your Data Access tier or your Presentation tier works? This is the magic of Design Patterns – names for these common execution techniques.

Today we’ll discuss two common Design Patterns for working with data storage: the Unit-of-Work Pattern, and the Repository Pattern.

Unit-ofWork Pattern

Microsoft really likes the Unit-of-Work Pattern, so most every sample you see coming out of their shops will include this pattern. The general theory is you have a reference to a stateful link to your data store – a Data Context – that will queue up your queries and then execute all the steps at once within a transaction. They’ll either all succeed or they’ll all fail.

For example you’re placing an order in an Order Entry system. You may insert or update the Customer record, insert an Order header, insert one or more Order detail lines, perhaps update the product’s available count. You want all of these writes to either succeed together or all fail. The theory goes that you don’t want to get the first 2 lines of the order saved, but then error trying to write line #3 and fail to update the customer or product. You want to be confident that all these steps will succeed or fail.

A typical method that uses the Unit of Work Pattern would look like so:

public int SaveOrder(int CustomerId, List<OrderDetail> OrderLines) {

    MyDataContext db = new MyDataContext();

    Customer c = db.Customers.First( c => c.ID == CustomerId );

    Order o = new Order {
        OrderDate = DateTime.Now,
        CustomerId = CustomerId,
    };

    c.LastOrderDate = o.OrderDate;

    db.Orders.Insert( o );
    db.OrderDetails.InsertRange( OrderLines );

    db.SaveChanges();

    return o.ID;
}

You can always spot the Unit-of-Work pattern with it’s tell-tale “now go do it” method – in this case named “SaveChanges”.

I could refactor this into 2 or 3 methods, one that creates the order, one that updates the customer, and all the while hang onto that stateful Data connection until I finally complete that work unit. These methods could call different stored procedures to do each step if need be. But good, bad, or indifferent, I must keep track of my handle to my work package until I’m done, then I must specifically say “done”.

Why use the Unit-of-Work pattern rather than just running each query separately? Because I’m confident that the entire work unit will either succeed or fail, I’ll never get caught with an inconsistent data state.

Repository Pattern

The Repository Pattern’s focus is to create an abstraction between the data store and the application so the application doesn’t need to think about how it’s stored, only that this widget stores it. The Repository is responsible for doing all the wackiness of connecting to the data store, opening the db connection, forming the query parameters, containing the open connection, etc. A Repository class has methods that take in objects or parameters, and return either an object or a list of objects. A Repository method doesn’t do business logic outside of simple validation. It just shims data to the data store.

A typical class that uses the Repository Pattern would look like so:

public class CustomerRepository {

    public Customer GetById( int CustomerId ) {
        using ( MyDataContext db = new MyDataContext() ) {
            return (
                from c in db.Customers
                where c.ID == CustomerId
                select c
            ).FirstOrDefault();
        }
    }

    public void Save( Customer c ) {
        using ( MyDataContext db = new MyDataContext() ) {
            if ( c.ID < 1 ) {
                db.Add( c );
            } else {
                db.Update( c );
            }
            db.SaveChanges(); // Sadly, Microsoft's ORM is Unit-of-Work
            // so we're just saying “my unit is this method”.
        }
    }

}

Usage of this class is pretty straight forward:

public void ChangeCustomerName( int CustomerId, string FirstName, string LastName ) {
    CustomerRepository r = new CustomerRepository(); // Since the Repository is stateless,
    // this could be a property on this class rather than a new instance
    Customer c = r.GetById( CustomerId );
    c.FirstName = FirstName;
    c.LastName = LastName;
    r.Save( c );
}

Notice that the consuming class doesn’t need to know if GetById() was a stored procedure, or used an XML file or called an external service. It just asked for something, passing in simple parameters, and got back a result.

Why use the Repository Pattern rather than just create an instance of the DataContext everywhere and just query stuff? Here’s a few reasons:

We can easily evaluate data usage, and optimize queries and indexes, because all queries are very carefully defined in a specific section of the code.
Data connections are opened and closed within the method, so no connections leak. (Yes, .net’s connection pooling insures we’re not actually hammering the db.)
If we ever need to swap data access strategies (LinqToSql to Entity Framework for example), we likely replace the DataContext class, change the Repository methods to call the new Context’s methods, but nothing else in the app needs to change. OK, if your Data Access strategy requires your objects derive from something, you’ll need to adjust that too. But the beauty of things like Entity Framework Code First or NHibernate or Dapper is that your entity classes don’t need to derive from anything funky – they’re just plain old C# objects.

A composite Unit-of-Work and Repository Pattern

Well we’ve painted a nice picture of each of the patterns and their benefits, but what if I want the best of both worlds? What if I want to know everything in various tables and methods happens at once but I want that clean separation that the Repository gives me? That turns out to be a pretty simple adjustment, though often it’s overkill for the task at hand. But most of the “Repository Pattern” samples from Microsoft will actually do it this way.

Here’s an example of the hybrid Unit-of-Work / Repository Pattern:

public class BaseRepository {

    public class UnitOfWorkHandle {
        internal MyDataContext db { get; set; }
    }

    public UnitOfWorkHandle StartUnitOfWork() {
        return new UnitOfWorkHandle {
            db = new MyDataContext();
        };
    }

    public void FinishUnitOfWork() {
        db.SaveChanges();
    }

}

public class CustomerRepository : BaseRepository {

    public Customer GetById( UnitOfWorkHandle Handle, int CustomerId ) {
        return (
            from c in Handle.db.Customers
            where c.ID == CustomerId
            select c
        ).FirstOrDefault();
    }

    public void Save( UnitOfWorkHandle Handle, Customer c ) {
        if ( c.ID < 1 ) {
            Handle.db.Add( c );
        } else {
            Handle.db.Update( c );
        }
    }

}

This is pretty cool:

The actual implementation details of my data access strategy is hidden (and thus easily replaceable)
All the queries are clearly defined and easily auditable
I’m confident that either everything will succeed or everything will fail

It has some draw-backs though:

Usage is pretty complex:

   public void ChangeCustomerName( int CustomerId, string FirstName, string LastName ) {
       CustomerRepository r = new CustomerRepository();
       UnitOfWorkHandle h = r.StartUnitOfWork();
       Customer c = r.GetById( h, CustomerId );
       c.FirstName = FirstName;
       c.LastName = LastName;
       r.Save( h, c );
       r.FinishUnitOfWork();
   }

If I forget to call FinishUnitOfWork() absolutely nothing happens: no error, no warning, no data is saved, it just goes poof, and sadly, it’s very easy to do.

Which to Use?

Ultimately the choice of which Data Access Pattern you use (similarly to your decision of which ORM / data transport technology you use) depends on the situation. Do you want clean separation of concerns between your tiers? Do you need to know that everything is executed together? Would you rather a simpler interface to getting data? Some even argue that Microsoft’s ORM tools are themselves the “Data Layer”, and that any pattern that wraps it is wasteful. Choose the strategy that gives you the best value for the needs of the project: speed of development, maintainable code, legible code, data access speed, etc.

Rob