An AspectJ library for fault tolerance

Eric | November 1, 2007

As a course project at McGill, I developed a little library for fault tolerance, written in AspectJ 5. It contains two components:

  1. Support for automatic N-Version programming.
  2. A full implementation of a recovery cache.

The fundamental concepts involved are explained in Joerg Kienzle’s lecture slides. N-Version programmin basically allows you to implement several different versions of an algorithm. With the library, those versions are then automatically run in parallel, synchronized and a voter decides the overall outcome of the concurrent computation (usually voting based on a majority vote). Here is some example code:

VersionGroup1 version1 = new Version1();

VersionGroup1 version2 = new Version2();

VersionGroup1 version3 = new Version3();

Voter v = new ExceptionalMajorityVoter();

VersionGroup group =

    new VersionGroup("group1",v,1,TimeUnit.SECONDS,version1,version2,version3);

VersionGroupRegistry.v().registerVersionGroup(group);

System.err.println(version1.compute(1));

This snippet of code associates three versions with the same group. In the last line, where we kick off the computation, an around-advice will automatically start all three versions and then return the voted result instead of the original one.

The recovery block on the other hand allows you to do checkpointing on the fly: Just call the checkpoint method, do some state changes and then, for instance in case of an exception, call restore(). All the previous heap state will be restored for you.

The library can be downloaded in the form of two Eclipse projects here:

  1. NVersion
  2. RecoveryCache

Example clients are included. The code is made available under the BSD license.