Instrumenting Android Apps with Soot

Eric | January 8, 2013

I am excited to let you know that we have recently committed to the development Branch of Soot support for reading and writing Dalvik bytecode with Soot. (This code will also be contained in Soot’s upcoming release.) This supports consists of two major modules. One is called Dexpler, mainly developed by a group around Alexandre Bartel, and with some enhancements by Ben Bellamy and myself as well as Frank Hartmann and Michael Markert, two students of mine. Dexpler converts Dalvik bytecode into Jimple’s three-address code. This may sound simple – after all Dalvik code is register based and Jimple uses local variables which are quite similar to logical registers. However, things get tricky with respect to typing. Jimple is typed; every local variable is of some declared type. In Dalvik, registers are untyped, and during the execution of a method the same register can hold values of quite different types. Constants in Dalvik are also untyped: when loading a double or a long into a register, Dalvik just loads an eight-byte bit-pattern into the register without telling you whether it’s a long or double. But in Jimple we need this information. Thus getting the typing of Jimple locals right is quite tricky and took us a while. On the other hand, typed locals are great, as they allow for a simpler and more precise pointer analysis, among other things.

The second component does just the opposite: it converts Jimple back into Dalvik code. This component was completed quite recently by Thomas Pilot, another one of my students. One of the main obstacles here is again the mismatch between local variables and registers: Soot needs to perform an at least somewhat clever register allocation to avoid using up too many registers. This currently works well enough to produce functional Dalvik code, however the code may sometimes not have the same structure as the original Dalvik code you read into Soot.

How to instrument

First grab the latest version of Soot, for instance our nightly build. Also check out the directory at https://github.com/Sable/android-platforms. This directory contains different versions of the Android standard library that Soot requires for resolving types of apps you analyze or instrument.
Next we implement a driver class with a main method into which we stick the following code:

//prefer Android APK files// -src-prec apk
Options.v().set_src_prec(Options.src_prec_apk);

//output as APK, too//-f J
Options.v().set_output_format(Options.output_format_dex);

// resolve the PrintStream and System soot-classes
Scene.v().addBasicClass("java.io.PrintStream",SootClass.SIGNATURES);
Scene.v().addBasicClass("java.lang.System",SootClass.SIGNATURES);

The first option instructs Soot to load Android APK files. The second one instructs Soot to produce a Dex/APK file as output. (In theory you could also convert Java into Dex or Dex into Java and so on.) The last two options tell Soot to load two classes which we will require for our instrumentation but which may otherwise not be required by the instrumented APK.

Next we add a Transform to Soot:

PackManager.v().getPack("jtp").add(new Transform("jtp.myInstrumenter", new BodyTransformer() {

	@Override
	protected void internalTransform(final Body b, String phaseName, @SuppressWarnings("rawtypes") Map options) {
		final PatchingChain units = b.getUnits();		
		//important to use snapshotIterator here
		for(Iterator iter = units.snapshotIterator(); iter.hasNext();) {
			final Unit u = iter.next();
			u.apply(new AbstractStmtSwitch() {

				public void caseInvokeStmt(InvokeStmt stmt) {
					//code here
				}

			});
		}
	}
}));

This will walk through all Units of all Bodies in the APK and on every InvokeStmt will invoke the code which I labeled with “code here”.

At this place we can now insert the following:

InvokeExpr invokeExpr = stmt.getInvokeExpr();
if(invokeExpr.getMethod().getName().equals("onDraw")) {

	Local tmpRef = addTmpRef(b);
	Local tmpString = addTmpString(b);

	  // insert "tmpRef = java.lang.System.out;" 
    units.insertBefore(Jimple.v().newAssignStmt( 
                  tmpRef, Jimple.v().newStaticFieldRef( 
                  Scene.v().getField("").makeRef())), u);

    // insert "tmpLong = 'HELLO';" 
    units.insertBefore(Jimple.v().newAssignStmt(tmpString, 
                  StringConstant.v("HELLO")), u);

    // insert "tmpRef.println(tmpString);" 
    SootMethod toCall = Scene.v().getSootClass("java.io.PrintStream").getMethod("void println(java.lang.String)");                    
    units.insertBefore(Jimple.v().newInvokeStmt(
                  Jimple.v().newVirtualInvokeExpr(tmpRef, toCall.makeRef(), tmpString)), u);

    //check that we did not mess up the Jimple
    b.validate();
}

This causes Soot to insert a System.out.println("HELLO") just before the method invocation but only if the target of this invocation is an onDraw method.

Last but not least, don’t forget to actually call Soot’s main method:

soot.Main.main(args);

And that’s it! Piece of cake, isn’t it? All you now need to do is run your driver class with the following arguments:

-android-jars path/to/android-platforms -process-dir your.apk

Here path/to/android-platforms is the path to the platform JAR files you downloaded earlier and your.apk is the path to the APK you with to instrument. The option -process-dir instructs Soot to process all classes inside this APK.
As a result you will find a new APK with the same name inside the directory ./sootOutput.

You can download the entire code of the example here: AndroidInstrument.java

If you find any bugs in those components (or other parts of Soot) please help us out by reporting them in our issue tracker.