Monday, October 19, 2009

Using Stub Repositories for Fast Testing

In this post, I would like to describe a technique for testing, and at times debugging, that I have recently employed. I have found it very useful for fast feedback and would like to share my experiences.

Recently I was looking for a better, faster way to run my tests. Some of the problems I had to overcome were:
  1. Slow Testing
    I have a large suite of user acceptance tests. Many of these tests define setup criteria (which effectively pre-populates a database), execute the test modifying the system in some manner, and tears down the test including the a database reset (delete all the data from the databases). As you know, and have most likely experienced, this can be frustratingly slow-especially on large volumes of data.
  2. Difficult Debugging
    When breaking into an execution of a test, I would like to see the state of my system. If my repositories have persisted their data to a database, it can be difficult for me to get at that data-especially in the middle of an uncommitted transaction.
So what I needed was to improve the overall speed of my build, and to be able to take a snapshot of my repositories at any point in time. I achieved this using Stub Repositories.

In the book 'Domain Driven Design' by Eric Evans, repositories are:

"A repository represents all objects of a certain types as a conceptual set (usually emulated). It acts like a collection, except with more elaborate querying capability. Objects of the appropriate type are added and removed, and the machinery behind the repository inserts them or deletes them from the database."

In practice, I use a repository to find existing objects, and also to persist newly created ones. Without getting into too much detail, it is used as the layer between my application and the database. The behaviour of a repository is to query the database for an object that meets the criteria of my input, and restore the saved state of an object with the data that is has found. I use repositories to manage instances of in memory objects.

A stub is, according to Martin Fowler:

"Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test. Stubs may also record information about calls, such as an email gateway stub that remembers the messages it 'sent', or maybe only how many messages it 'sent'."

By combining these 2 concepts, I get what I've referred to as a StubRepository (can you see what I did there?).

For any repository implementation in my application, I define an interface describing all the query and collection management actions that can be performed on a particular type of entity. Once I have this interface defined, I can then implement both a real version of it that uses a real database, and a stub version that is purely in memory.

As an example, let's define an Employee Repository interface.

public interface EmployeeRepository {
Employee findBy(EmployeeNumber number);
List<Employee> findBy(EmployerName employer);
void add(Employee employee);
void delete(Employee employee);
}
A typical HibernateEmployeeRepository (which of course, implements EmployeeRepository) could look like this:

public class HibernateEmployeeRepository extends HibernateRepository
implements EmployeeRepository {
public HibernateEmployeeRepository(SessionProvider sessionFactory) {
super(Employee.class, sessionFactory);
}

public Employee findBy(EmployeeNumber number) {
return findUnique(eq("number", number));
}

public List<Employee> findBy(EmployerName employer) {
return find(eq("employer", employer));
}

public void add(Employee employee) {
getCurrentSession().save(employee);
}

public void delete(Employee employee) {
getCurrentSession().delete(employee);
}
}
So long as my application only ever uses the interface (beyond repository creation), my application can query, add and delete employee objects, without having to know what database magic is happening under the hood. And because I have this well defined interface, I can now create a stub version of it for use in testing. I generally do this using maps and collections. Because a repository has collection semantics, a collection should be all that is required to implement the stub. If not, this may be a code smell.

So lets implement a stub version of the employee interface.

public class StubEmployeeRepository extends HibernateRepository
implements EmployeeRepository {
private List<Employee> employees = new ArrayList<Employee>();
private Map<EmployeeNumber, Employee> byNumber =
new HashMap<EmployeeNumber, Employee>();
private Map<Employer, Employee> byEmployer = new HashMap<Employer, Employee>();

public Employee findBy(EmployeeNumber number) {
return byNumber.get(number);
}

public List<Employee> findBy(Employer employer) {
return byEmployer.get(employer);
}

public void add(Employee employee) {
employees.add(employee);
byNumber.put(employee.getNumber(), employee);
byEmployer.put(employee.getEmployer(), employee);
}

public void delete(Employee employee) {
employees.remove(employee);
byNumber.remove(employee.getNumber());
byEmployer.remove(employee.getEmployer());
}
}
So now I have the following:This now lets us use either implementation of the repository at runtime. Of course, only the hibernate version will be used in production code, but for our tests, I have the luxury of swapping in the stub, rather than using the real hibernate repository. Have I overcome the earlier problems?
  1. Speed? Yes!In memory maps and collections are fast.
    For example, in some of our acceptance criteria, we are required to create and manage over 30 (average sized) objects. Using the hibernate repositories (and Oracle) takes just over 10 seconds. Using stubs takes around 0.5 seconds. This is 20 times faster! And if you extrapolate this to all of your acceptance tests, it is easy to see the benefit.
  2. Easy debug? Yes!When running acceptance tests, quite often it is nice to stop and have a look at the state of your system part way through. With the hibernate repository and newly created objects would have been saved to the database. Assuming that these are being saved from inside a transaction, it can be difficult to query the database to see the current state. When using the stub repository, you can easily look at the contents of a map or collection.
You can get close to both of these by using an in memory database. Hibernate is very nice and will create an in memory database for you for testing, auto generated from the hbm's. But the setup and teardown time of this database a lot slower than stubs. Setup and teardown of the stub repositories is extremely fast. And again, maps and collections are much easier to debug than an in memory database.

This is quite cool, but there are always disadvantages to any idea. One disadvantage I found was that whenever I added a new method to the repository interface, I needed to implement it in both the hibernate and stub versions. Here lie dragons. What if my implementations diverged? For example, what if the list returned by the repository version was sorted differently from my stub version. I could potentially be causing myself future headaches.

To get around this, I use the tests that I wrote for the hibernate version of the repository to test both repositories. This is done by moving the tests into an abstract repository test (see http://c2.com/cgi/wiki$?AbstractTestCases). I then extend this abstract to create a test for the hibernate and the stub repositories as such:As you can see from this diagram, each of the concrete tests create their own implementation of a repository in the createRepository() method. The HibernateEmployeeRepositoryTest creates a HibernateEmployeeRepository and the StubEmployeeRepositoryTest creates the StubEmployeeRepository. These are used in the abstract test.

Any future (test driven) functionality that is added to my repository is tested at the abstract test level. This way I can be certain that both the hibernate and stub versions of the repository include the functionality and behave the same. For example, if I decided that the repository should throw an EmployeeNotFound exception in the findBy methods, when testing for it, it would require that this new behaviour be exhibited by both implementations.

Now you may be asking, how do you switch the implementations? I use an environment factory create the repositories for me, and a switch to give me either a real (hibernate) factory, or a stub factory.

For example:

public interface ApplicationTestingEnvironment {
EmployeeRepository createEmployeeRepository();
PayrollRepository createPayrollRepository();
}

public class HibernateTestingEnvironment implements ApplicationTestingEnvironment {
private final SessionFactory sessionFactory;

public HibernateTestingEnvironment(SessionFactory sessionFactory) {
this.sessionFactory = sessionFactory;
}

public EmployeeRepository createEmployeeRepository() {
return new HibernateEmployeeRepository(sessionFactory);
}

public PayrollRepository createPayrollRepository() {
return new HibernatePayrollRepository(sessionFactory);
}
}

public class StubTestingEnvironment implements ApplicationTestingEnvironment {
public EmployeeRepository createEmployeeRepository() {
return new StubEmployeeRepository();
}

public PayrollRepository createPayrollRepository() {
return new StubPayrollRepository();
}
}

public enum ApplicationTestingFactory {
USE_STUBS{
return new StubTestingEnvironment();
},
USE_REAL{
return new HibernateTestingEnvironment(new SessionFactory());
},
CHOOSE_USING_SYSTEM_PROPERTY() {
if ("false".equals(System.getProperty("use.stubs"))) {
return USE_REAL.create();
}
return USE_STUBS.create();
}
public abstract ApplicationTestingFactory create();
}
One of the nice things I've done here is to include the system property as a decider for using a stub or a hibernate repository. This allows us to kick off testing from the command line using either stubs or real. Mainly because of the fast feedback, I've defaulted to using the stub repositories. But on the continuous integration server, you can leave it using real databases. This can be done as follows:
ant full-build -Duse.stubs=false
I hope this helps. If you have any questions of comments, please don't hesitate to add a comment below.

2 comments:

  1. How about using an in-memory database, and just swapping out the hibernate properties? Hibernate can also generate the schema of the database from the mapping files.

    That way, you don't need to maintain two versions of the repository as well.

    I've found using H2 works pretty well.

    ReplyDelete
  2. Yes, in memory database does help with the speed issue, but as mentioned, it can be difficult to debug.

    ReplyDelete