Advanced Forensic Framework 4 (AFF4)
Why did we want to design yet another forensic file format?
Traditional forensic file formats have a number of limitations which have been exposed over the years:
- Proprietary formats like EWF are difficult to implement and explain. EWF is a fairly complex file format. Most of the details are reverse engineered. Recovery from damaged EWF files is difficult as detailed knowledge of the file format is required.
- Simple file formats like dd are very large since they are uncompressed. They also dont store metadata, signatures or have cryptographic support.
- Traditional file formats are designed to store a single stream. Often in an investigation, however, multiple source of data need to be acquired (sometimes simultaneously) and stored in the same evidence volumes.
- Traditional file formats just deal with data - there is no attempt to build a universal evidence management system integrated within the file specification.
The previous AFF format made huge advancements in the field introducing excellent support for cryptography, digital signatures, compression and even the concepts of external referencing. It was time to gather up all the good things in AFF and redesign a new AFF4 specification.
We wanted to use a well recognized, widely supported and open bit level format. One of the strengths of AFF was the use of segments within the file format itself. It because obvious that the only requirement we have from an underlying storage mechanism is the ability to store blobs of data by name, and retrieve them by that name. How these are actually stored is quite irrelevant to us.
The sections below give a quick overview to some of the major ideas.
AFF4 is an object oriented architecture. We term the AFF4 universe the total set of objects which are known. Because AFF4 is designed to be scalable to huge evidence corpuses the AFF4 universe is infinite. All objects are addressable by their name which is unique in the universe. For example an AFF4 object might have a name of:
This is a standard URN notation object. The URN is unique. There will never be another object created anywhere in the universe with the same URN. Once objects are created their URN is fixed.
The AFF4 universe uses RDF to specify attributes about objects. In its simplest form (the one we use) RDF is just a set of statements about an object of the form:
Subject Attribute Value
******** Object urn:aff4:f3eba626-505a-4730-8216-1987853bc4d2 *********** aff4:stored = urn:aff4:4bdbf8bc-d8a5-40cb-9af0-fd7e4d0e2c9e aff4:type = image aff4:interface = stream aff4:timestamp = 0x49E9DEC3 aff4:chunk_size = 32k aff4:compression = 8 aff4:chunks_in_segment = 2048 aff4:size = 10485760
This shows that the object named (the Subject) has all these attributes and their values. We call these relations or facts. The entire AFF4 universe is constructed around these facts. As we will see later facts can be signed by a person - which essentially has the person asserting that the facts are true.
AFF4 objects exist because they do something useful. What they do depends on the interface they present. Currently there are a few interfaces, the most important ones are the Volume interface and the Stream interface. An object's interface is a fact about the object with an attribute of aff4:interface. This tells us what the object can do for us.
On the other hand AFF4 objects can actually be different things and do what they do in a different way. The actual type of an object is specified by the attribute aff4:type. Whereas an interface tells us what the object can do for us, a type tells us what it actually is. (Its possible to change an object's type without changing its interface for example going from a ZipFile to a Directory volume. This does not affect any users of the object).
We define a Volume as a storage mechanism which can store a segment (bit of binary data) by name and retrieve it by name. Currently we have two volume implementations: a Directory and a ZipFile.
The Directory implementation stores the segments as flat files inside a regular directory on the filesystem. This is really useful if we want to image to a FAT filesystem since each segment is really small and we will not exceed the file size limitations. Its also possible to root the directory on a http url (i.e. the directory starts with http://somehost/url/). This allows us to use the image directly from the web - no need to download the whole thing.
The ZipFile implementation stores segments inside a zip archive. If the archive gets too large (over 4Gb) we use the Zip64 extensions to store offsets in 64 bits. This is nice since small volumes can just be opened with windows explorer. Its also really easy to extract the data out.
Example: http://www.pyflag.net/images/test.zip is an example of a small (about 1mb) AFF4 image.
Directory and ZipFile volumes can be easily converted from one to the other (i.e. unzip the ZipFile into a directory to create a Directory volume).
Streams are the basic interface for storing image data. Streams present a consistent interface which presents the methods of read, seek, tell' and close. (Streams also support write, but thats a bit special because its how you actually create them).
As long as an AFF4 object presents a stream interface its possible to perform random reads within the body of data. Hence its possible to store any image data within the stream. The following section explain some of the specific implementations of streams.