Difference between pages "Carver 2.0 Planning Page" and "Category:Forensics File Formats"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(Requirements)
 
m (Independent File Formats)
 
Line 1: Line 1:
This page is for planning Carver 2.0.
+
Many computer forensic programs, especially the all-in-one suites, use their own file formats to store information.  
  
Please, do not delete text (ideas) here. Use something like this:
+
==Independent File Formats==
 +
These file formats were developed independently of any specific forensics package.
 +
=== [[AFF]]===
 +
Full details of the format and a working implementation can be downloaded from http://www.afflib.org/
  
<pre>
+
=== [[AFF4]]===
<s>bad idea</s>
+
AFF4 is a complete redesign of the AFF format. AFF4 is geared towards very large corpuses of images. It features a choice of binary container formats such as Zip, Zip64 and simple directories. Storage can be done using regular HTTP, as well as imaging directly to a central HTTP server using webdav. The format includes support for maps - which are zero copy transformations of data - for example, instead of storing a whole new copy of a carved file we just store a map of the blocks allocated to this file. This makes it trivial to chop up an image in many different ways with no storage overheads (for example chop up a memory image into the different process address spaces, extract TCP streams from a PCAP file with no copying overheads or extract all files from a filesystem with no copying). AFF4 also supports cryptography and image signing. AFF4 support fuse to present the images transparently to clients.
:: good idea
+
</pre>
+
  
This will look like:
 
  
<s>bad idea</s>
+
=== [[gfzip]] (generic forensic zip) file format===
:: good idea
+
  
= License =
+
Gfzip aims to provide an open file format for 'forensic complete' 'compressed' and 'signed' disk image data files.
 +
Uncompressed disk images can be used the same way [[dd]] images are, as gfzip uses a data first footer last design.
 +
Gfzip uses multi level [[SHA256]] digest based integrity guards instead of [[SHA1]] or the deprecated [[MD5]] algoritm.
 +
User supplied meta data is embedded in a meta data section within the file.
 +
A very important feature that gfzip focuses on extensively is the use of signed data and meta data sections using x509 certificates.
  
BSD-3.
+
==Program-Specific File Formats==
:: [[User:Joachim Metz|Joachim]] library based validators could require other licenses
+
These file formats were developed for use with a specific forensics program. Sometimes they can be used with other programs whose authors have specifically reverse-engineered the software. Other times they cannot.
  
= OS =
+
===[[Encase image file format]]===
 +
Perhaps the de facto standard for forensic analyses in law
 +
enforcement, Guidance Software's [[EnCase]] Forensic uses
 +
a closed format for images. This format is heavily based on ASR Data's
 +
Expert Witness Compression Format.  EnCase's Evidence File
 +
(.E01) format contains a physical bitstream
 +
of an acquired disk, prefixed with a "Case Info" header,
 +
interlaced with CRCs for every block of 64 sectors (32 KB), and
 +
followed by a footer containing an [[MD5]] hash for the entire
 +
bitstream.  Contained in the header are the date and time of
 +
acquisition, an examiner's name, notes on the acquisition, and an
 +
optional password; the header concludes with its own CRC.
  
Linux/FreeBSD/MacOS
+
Not only is the format is compressible, it is also searchable.
: Shouldn't this just match what the underlying afflib & sleuthkit cover? [[User:RB|RB]]
+
Compression is block-based, and jump tables and "file pointers" are maintained in the format's header or
:: Yes, but you need to test and validate on each. Question: Do we want to support windows? [[User:Simsong|Simsong]] 21:09, 30 October 2008 (UTC)
+
between blocks "to enhance speed"Disk images
:: [[User:Joachim Metz|Joachim]] I think we would do wise to design with windows support from the start this will improve the platform independence from the start
+
can be split into multiple segment files (e.g., for archival to CD or
:::: Agreed; I would even settle at first for being able to run against CygwinNote that I don't even own or use a copy of Windows, but the vast majority of forensic investigators do. [[User:RB|RB]] 14:01, 31 October 2008 (UTC)
+
DVD).
:: [[User:Capibara|Rob J Meijer]] Leaning heavily on the autotools might be the way to go. I do however feel that support requirements for windows would not be essential. Being able to run from a virtual machine with the main storage mounted over cifs should however be tested and if possible tuned extensively.
+
:::: [[User:Joachim Metz|Joachim]] You'll need more than autotools to do native Windows support i.e. file access, UTF-16 support, wrap some basic system functions or have them available otherwise
+
  
= Name tooling =
+
Up to version 5 of [[EnCase]] the segment files could be no larger than 2 GB. This restriction has been removed using a work around the 31-bit offset values in version 6 of EnCase.
  
* [[User:Joachim Metz|Joachim]] A name for the tooling I propose coldcut
+
The format restricts the type and quantity of metadata that can be associated with an image. Extended EWF (EWF-X) defined by the libewf project provides a work around for this restriction specifying a new header and (digest) hash section using XML string to store the metadata. These EWF-X E01 files are compatible with EnCase and allow to store more metadata.
:: How about 'butcher'?  ;)  [[User:RB|RB]] 14:20, 31 October 2008 (UTC)
+
:: [[User:Joachim Metz|Joachim]] cleaver ( scalpel on steroids ;-) )
+
* I would like to propose Gouge or Chisel :-) [[User:Capibara|Rob J Meijer]]
+
  
= Requirements =
+
Though some have reverse-engineered the format for compatibility's sake, Guidances extensions to the format remains closed.
  
[[User:Joachim Metz|Joachim]] Could we do a MoSCoW evaluation of these.
+
===[[ILook Investigator]]'s IDIF, IRBF, and IEIF Formats===
  
* AFF and EWF file images supported from scratch. ([[User:Joachim Metz|Joachim]] I would like to have raw/split raw and device access as well)
+
ILook Investigator v8 and its disk-imaging
:: If we base our image i/o on afflib, we get all three with one interface. [[User:RB|RB]] Instead of letting the tools use afflib, better to write an afflib module for carvfs, and update the libewf module. The tool could than be oblivious of the file format. [[User:Capibara|Rob J Meijer]]
+
counterpart, [[IXimager]], offer three proprietary, authenticated image
:::: [[User:Joachim Metz|Joachim]] this layer should support multi threaded decompression of compressed image types, this speeds up IO
+
formats: compressed (IDIF), non-compressed (IRBF), and encrypted
* [[User:Joachim Metz|Joachim]] volume/partition aware layer (what about carving unpartioned space)
+
(IEIF). Although few technical details are disclosed publicly,
* File system aware layer. This could be or make use of tsk-cp.
+
IXimager's online documentation provides some
** By default, files are not carved. (clarify: only identified? [[User:RB|RB]]; I guess that it operates like [[Selective file dumper]] [[User:.FUF|.FUF]] 07:00, 29 October 2008 (UTC)). Alternatively, the tool could use libcarvpath and output carvpaths or create a directory with symlinks to carvpaths that point into a carvfs mountpoint [[User:Capibara|Rob J Meijer]].
+
insights: IDIF "includes protective mechanisms to detect changes
* Plug-in architecture for identification/validation.
+
from the source image entity to the output form" and supports
** [[User:Joachim Metz|Joachim]] support for multiple types of validators
+
"logging of user actions within the confines of that event;"  IRBF
*** dedicated validator
+
is similar to IDIF except that disk images are left uncompressed;
*** validator based on file library (i.e. we could specify/implement a file structure API for these)
+
IEIF, meanwhile, encrypts said images.
*** configuration based validator (Can handle config files,like Revit07, to enter different file formats used by the carver.)
+
* Ship with validators for:
+
[[User:Joachim Metz|Joachim]] I think we should distinguish between file format validators and content validators
+
** JPEG
+
** PNG
+
** GIF
+
** MSOLE
+
** ZIP
+
** TAR (gz/bz2)
+
  
[[User:Joachim Metz|Joachim]] For a production carver we need at least the following formats
+
For compatibility with ILook Investigator v7 and other forensic
** Grapical Images
+
tools, IXimager allows for the transformation of each of these
*** JPEG (the 3 different types with JFIF/EXIF support)
+
formats into raw format.
*** PNG
+
*** GIF
+
*** BMP
+
*** TIFF
+
** Office documents
+
*** OLE2 (Word/Excell content support)
+
*** PDF
+
*** Open Office/Office 2007 (ZIP+XML)
+
:: Extension validation? AFAIK, MS Office 2007 [[DOCX]] format uses plain ZIP (or not?), and carved files will (or not?) have .zip extension instead of DOCX. Is there any way to fix this (may be using the file list in zip)? [[User:.FUF|.FUF]] 20:25, 31 October 2008 (UTC)
+
:: [[User:Joachim Metz|Joachim]] Addition: Office 2007 also has a binary file format which is also a ZIP-ed data
+
  
** Archive files
+
===[[ProDiscover]] Family's [[ProDiscover image file format]]===
*** ZIP
+
*** 7z
+
*** gzip
+
*** bzip2
+
*** tar
+
*** RAR
+
** E-mail files
+
*** PFF (PST/OST)
+
*** MBOX (text based format, base64 content support)
+
** Audio/Video files
+
*** MPEG
+
*** MP2/MP3
+
*** AVI
+
*** ASF/WMV
+
*** QuickTime
+
*** MKV
+
** Printer spool files
+
*** EMF (if I remember correctly)
+
** Internet history files
+
*** index.dat
+
*** firefox (sqllite 3)
+
** Other files
+
*** thumbs.db
+
*** pagefile?
+
  
* Simple fragment recovery carving using gap carving.
+
Used by [[Technology Pathways]] [[ProDiscover]] Family of security tools, the ProDiscover Image File format consists of five parts: a 16-byte Image File Header, which includes a signature and version number for an
** [[User:Joachim Metz|Joachim]] have hook in for more advanced fragment recovery?
+
image; a 681-byte Image Data Header, which contains user-provided
* Recovering of individual ZIP sections and JPEG icons that are not sector aligned.
+
metadata about the image; Image Data, which comprises a single block
** [[User:Joachim Metz|Joachim]] I would propose a generic fragment detection and recovery
+
of uncompressed data or an array of blocks of compressed data; an
* Autonomous operation (some mode of operation should be completely non-interactive, requiring no human intervention to complete [[User:RB|RB]])
+
Array of Compressed Blocks sizes (if the Image Data is, in fact,
** [[User:Joachim Metz|Joachim]] as much as possible, but allow to be overwritten by user
+
compressed); and I/O Log Errors describing any problems during the
* Tested on 500GB-sized images. Should be able to carve a 500GB image in roughly 50% longer than it takes to read the image.
+
image's acquisition.
** Perhaps allocate a percentage budget per-validator (i.e. each validator adds N% to the carving time) [[User:RB|RB]]
+
** [[User:Joachim Metz|Joachim]] have multiple carving phases for precision/speed trade off?
+
* Parallelizable
+
** [[User:Joachim Metz|Joachim]] tunable for different architectures
+
* Configuration:
+
** Capability to parse some existing carvers' configuration files, either on-the-fly or as a one-way converter.
+
** Disengage internal configuration structure from configuration files, create parsers that present the expected structure
+
** [[User:Joachim Metz|Joachim]] The validator should deal with the file structure the carving algorithm should not know anything about the file structure (as in revit07 design)
+
**  Either extend Scalpel/Foremost syntaxes for extended features or use a tertiary syntax ([[User:Joachim Metz|Joachim]] I would prefer a derivative of the revit07 configuration syntax which already has encountered some problems of dealing with defining file structure in a configuration file)
+
* Can output audit.txt file.
+
* [[User:Joachim Metz|Joachim]] Can output database with offset analysis values i.e. for visualization tooling
+
* [[User:Joachim Metz|Joachim]] Can output debug log for debugging the algorithm/validation
+
* Easy integration into ascription software.
+
** [[User:Joachim Metz|Joachim]] I'm no native speaker what do you mean with "ascription software"?
+
:: I think this was another non-native requesting easy scriptability. [[User:RB|RB]] 14:20, 31 October 2008 (UTC)
+
:::: [[User:Joachim Metz|Joachim]] that makes sense ;-)
+
* [[User:Joachim Metz|Joachim]] When the tool output files the filenames should contain the offset in the input data (in hexadecimal?)
+
:: [[User:Mark Stam|Mark]] I really like the fact carved files are named after the physical or logical sector in which the file is found (photorec)
+
:::: [[User:Joachim Metz|Joachim]] This naming schema might cause duplicate name problem for extracting embedded files and extracting files from non sector aligned file systems.
+
* [[User:Joachim Metz|Joachim]] Should the tool allow to export embedded files?
+
* [[User:Joachim Metz|Joachim]] Should the tool allow to export fragments separately?
+
* [[User:Mark Stam|Mark]] I personally use photorec often for carving files in the whole volume (not only unallocated clusters), so I can store information about all potential interesting files in MySQL
+
* [[User:Mark Stam|Mark]] It would also be nice if the files can be hashed immediately (MD5) so looking for them in other tools (for example Encase) is a snap
+
  
= Ideas =
+
Though fairly well documented, the format is not extensible.
* Use as much TSK if possible. Don't carry your own FS implementation the way photorec does.
+
:: [[User:Joachim Metz|Joachim]] using TSK as much as possible would not allow to add your own file system support (i.e. mobile phones, memory structures, cap files) I would propose wrapping TSK and using it as much as possible but allow to integrate own FS implementations.
+
* Extracting/carving data from [[Thumbs.db]]? I've used [[foremost]] for it with some success. [[Vinetto]] has some critical bugs :( [[User:.FUF|.FUF]] 19:18, 28 October 2008 (UTC)
+
:: [[User:Joachim Metz|Joachim]] this poses an interesting addition to the carver do we want to support (let's call it) 'recursive in file carving' (for now) this is different from embedded files because there is a file system structure in the file and not just another file structure
+
  
[[User:Capibara|Rob J Meijer]] :
+
=== [[PyFlag]]'s [[sgzip]] Format===
* Use libcarvpath whenever possible and by default to avoid high storage requirements.
+
:: [[User:Joachim Metz|Joachim]] For easy deployment I would not opt for making an integral part of the tool solely dependant on a single external library or the library must be integrated in the package
+
::[[User:Capibara|Rob J Meijer]] Integrating libraries (libtsk,libaff.libewf,libcarvpath etc) is bad practice, autotools are your friend IMO.
+
:: [[User:Joachim Metz|Joachim]] I'm not talking about integrating (shared) libraries. I'm talking about that an integral part of a tool should be part of it's package. Why can't the tool package contain shared or static libraries for local use? A far worse thing to do is to have a large set of dependencies. The tool package should contain the most necessary code. afflib/libewf support could be detected by the autotools a neat separation of functionality.
+
* Dont stop with filesystem detection after the first match. Often if a partition is reused with a new FS and is not all that full yet, much of the old FS can still be valid. I have seen this with ext2/fat. The fact that you have identified a valid FS on a partition doesn't mean there isn't an(almost) valid second FS that would yield additional files. Identifying doubly allocated space might in some cases also be relevant.
+
:: [[User:Joachim Metz|Joachim]] What your saying is that dealing with file system fragments should be part of the carving algorithm
+
* Allow use where filesystem based carving is done by other tool, and the tool is used as second stage on (sets of) unallocated block (pseudo) files and/or non FS partition (pseudo) files.
+
:: [[User:Joachim Metz|Joachim]] I would not opt for this. The tool would be dependent on other tools and their data format, which makes the tool difficult to maintain. I would opt to integrate the functionality of having multiple recovery phases (stages) and allow the tooling to run the phases after one and other or separately.
+
::[[User:Capibara|Rob J Meijer]] More generically, I feel a way should exist to communicate the 'left overs' a previous (non open, for example LE-only) tool left.
+
:: [[User:Joachim Metz|Joachim]] I guess if the tool is designed to handle multiple phases it should store its data somewhere. So it should be possible to convert results of such non open tooling to the format required. However I would opt to design the recovery functionality of these non-open tools into open tools. And not to limit ourselves making translators due to the design of these non-open tools.
+
* Ability to be used as a library instead of a tool. Ability to access metadata true library, and thus the ability to set metadata from the carving modules. This would be extremely usefull for integrating the project into a framework like ocfa.
+
:: [[User:Joachim Metz|Joachim]] I guess most of the code could be integrated into libraries, but I would not opt integrating tool functionality into a library
+
* A wild idea that I hope at least one person will have a liking for: It might be very interesting to look at the possibilities of using a multi process style of module support and combine it with a least authority design. On platforms that support AppArmor (or similar) and uid based firewall rules, this could make for the first true POLA (principle of least authority) based forensic tool ever. POLA based forensics tools should make for a strong integrity guard against many anti forensics. Alternatively we could look at integrating a capability secure language (E?) for implementation of at least validation modules. I don't expect this idea to make it, but mentioning it I hope might spark off less strong alternatives that at least partially address the integrity + anti-forensics problem. If we can in some way introduce POLA to a wider forensics public, other tools might also pick up on it what would be great.
+
:: [[User:Joachim Metz|Joachim]] Could you give an example of how you see this in action?
+
* [[User:Mark Stam|Mark]] I think it would be very handy to have a CSV, TSV, XML or other delimited output (log)file with information about carved files. This output file can then be stored in a database or Excel sheet (report function)
+
  
== Format syntax specification ==
+
Supported by [[PyFlag]], a "Forensic and Log Analysis GUI" begun as a project in the Australian Department of Defence, sgzip is a seekable variant of the gzip format. By compressing blocks (of 32KB, by default) individually, sgzip allows disk images to be searched for keywords without being fully decompressed.  The format does not associate metadata with images. In addition to its own sgzip format, PyFlag can also read and write the Expert Witness Compression Format and AFF.
* Carving data structures. For example, extract all TCP headers from image by defining TCP header structure and some fields (e.g. source port > 1024, dest port = 80). This will extract all data matching the pattern and write a file with other fields. Another example is carving INFO2 structures and URL activity records from index.dat [[User:.FUF|.FUF]] 20:51, 28 October 2008 (UTC)
+
** This has the opportunity to be extended to the concept of "point at blob FOO and interpret it as BAR"
+
.FUF added:
+
The main idea is to allow users to define structures, for example (in pascal-like form):
+
  
<pre>
+
=== [[Rapid Action Imaging Device]] (RAID)'s Format===
Field1: Byte = 123;
+
SomeTextLength: DWORD;
+
SomeText: string[SomeTextLength];
+
Field4: Char = 'r';
+
...
+
</pre>
+
  
This will produce something like this:
+
Though relatively little technical detail is publicly available, DIBS USA's Rapid Action Imaging Device (RAID) offers "built in [sic] integrity checking" and is to be designed to create an identical copy in raw format of one disk on another.  The copy can then "be inserted into a forensic workstation".
<pre>
+
Field1 = 123
+
SomeTextLength = 5
+
SomeText = 'abcd1'
+
Field4 = 'r'
+
</pre>
+
  
(In text or raw forms.)
+
=== [[Safeback]]'s Format===
  
Opinions?
+
SafeBack, a DOS-based utility designed to create exact copies of entire disks or partitions, offers a "self-authenticating" format for images, whereby [[SHA256]] hashes are stored along with data to ensure the latter's integrity.  Although few technical details are disclosed publicly, SafeBack's authors claim that the software "safeguards the internally stored SHA256 values".
  
Opinion: Simple pattern identification like that may not suffice, I think Simson's original intent was not only to identify but to allow for validation routines (plugins, as the original wording was).  As such, the format syntax would need to implement a large chunk of some programming language in order to be sufficiently flexible. [[User:RB|RB]]
+
=== [[SDi32]]'s Format===
  
[[User:Joachim Metz|Joachim]]
+
Imaging software designed to be used with write-blocking hardware, Vogon International's SDi32 is capable of making identical copies of disks to tape, disk, or file, with optional CRC32 and [[MD5]] fingerprints.  The copies are stored in raw format.
In my option your example is too limited. Making the revit configuration I learned you'll need a near programming language to specify some file formats.
+
A simple descriptive language is too limiting. I would also go for 2 bytes with endianess instead of using terminology like WORD and small integer, it's much more clear. The configuration also needs to deal with aspects like cardinality, required and optional structures.
+
:: This is simply data structures carving, see ideas above. Somebody (I cannot track so many changes per day) separated the original text. There is no need to count and join different structures. [[User:.FUF|.FUF]] 19:53, 31 October 2008 (UTC)
+
:::: [[User:Joachim Metz|Joachim]] This was probably me is the text back in it's original form?
+
:::: I started it by moving your Revit07 comment to the validator/plugin section in [http://www.forensicswiki.org/index.php?title=Carver_2.0_Planning_Page&diff=prev&oldid=7583 this edit], since I was still at that point thinking operational configuration for that section, not parser configurations. [[User:RB|RB]]
+
:::: [[User:Joachim Metz|Joachim]] I renamed the title to format syntax, clarity is important ;-)
+
  
Please take a look at the revit07 configuration. It's not there yet but goes a far way. Some things currently missing:
+
=== [[SMART]]'s Formats===
* bitwise alignment
+
* handling encapsulated streams (MPEG/capture files)
+
* handling content based formats (MBOX)
+
  
=Caving algorithm =
+
[[SMART]], a software utility for Linux designed by the
[[User:Joachim Metz|Joachim]]
+
original authors of Expert Witness (now sold under the name of
* should we allow for multiple carving phases (runs/stages)?
+
EnCase), can store disk images as pure bitstreams
:: I opt yes (separation of concern)
+
(compressed or uncompressed) and also in ASR Data's [[Expert Witness]]
* should we allow for multiple carving algorithms?
+
Compression Format.  Images stored in the latter format
:: I opt yes, this allows testing of different approaches
+
can be stored as a single file or in multiple segment files, each of
* Should the algorithm try to do as much in 1 run over the input data? To reduce IO?
+
which consist of a standard 13-byte header followed by a series of
:: I opt that the tool should allow for multiple and single run over the input data to minimize the IO or the CPU as bottleneck
+
sections, each of type "header", "volume", "table", "next",
* Interaction between algorithm and validators
+
or "done". Each section includes its type string, a 64-bit offset
** does the algorithm passes data blocks to the validators?
+
to the next section, its 64-bit size, padding, and a CRC, in
** does a validator need to maintain a state?
+
addition to actual data or comments, if applicable. Although the
** does a validator need to revert a state?
+
format's "header" section supports free-form notes, an image can
** How do we deal with embedded files and content validation? Do the validators call another validator?
+
have only one such section (in its first segment file only).
* do we use the assumption that a data block can be used by a single file (with the exception of embedded/encapsulated files)?
+
* Revit07 allows for multiple concurrent result files states to deal with fragmentation. One has the attribute of being active (the preferred) and the other passive. Do we want/need something similar? The algorithm adds block of input data (offsets) to these result files states.
+
** if so what info would these result files states require (type, list of input data blocks)
+
* how do we deal with file system remainders?
+
** Can we abstract them and compare them against available file system information?
+
* Do we carve file systems in files?
+
:: I opt that at least the validator uses this information
+
  
==Caving scenarios ==
+
===Programs with no specific file format===
[[User:Joachim Metz|Joachim]]
+
Several programs can read multiple file formats, but do not have their own proprietary formats.
* normal file (file structure, loose text based structure (more a content structure?))
+
* fragmented file (the file entirely exist)
+
* a file fragment (the file does not entirely exist)
+
* intertwined file
+
* encapsulated file (MPEG/network capture)
+
* embedded file (JPEG thumbnail)
+
* obfuscation ('encrypted' PFF) this also entails encryption and/or compression
+
* file system in file
+
 
+
=File System Awareness =
+
==Background: Why be File System Aware?==
+
Advantages of being FS aware:
+
* You can pick up sector allocation sizes
+
:: [[User:Joachim Metz|Joachim]] do you mean file system block sizes?
+
* Some file systems may store things off sector boundaries. (ReiserFS with tail packing)
+
* Increasingly file systems have compression (NTFS compression)
+
* Carve just the sectors that are not in allocated files.
+
 
+
==Tasks that would be required==
+
 
+
==Discussion==
+
:: As noted above, TSK should be utilized as much as possible, particularly the filesystem-aware portion.  If we want to identify filesystems outside of its supported set, it would be more worth our time to work on implementing them there than in the carver itself.  [[User:RB|RB]]
+
 
+
:::: I guess this tool operates like [[Selective file dumper]] and can recover files in both ways (or not?). Recovering files by using carving can recover files in situations where sleuthkit does nothing (e.g. file on NTFS was deleted using ntfs-3g, or filesystem was destroyed or just unknown). And we should build the list of filesystems supported by carver, not by TSK. [[User:.FUF|.FUF]] 07:08, 29 October 2008 (UTC)
+
 
+
:: This tool is still in the early planning stages (requirements discovery), hence few operational details (like precise modes of operation) have been fleshed out - those will and should come later.  The justification for strictly using TSK for the filesystem-sensitive approach is simple: TSK has good filesystem APIs, and it would be foolish to create yet another standalone, incompatible implementation of filesystem(foo) when time would be better spent improving those in TSK, aiding other methods of analysis as well.  This is the same reason individuals that have implemented several other carvers are participating: de-duplication of effort.  [[User:RB|RB]]
+
 
+
[[User:Joachim Metz|Joachim]] I would like to have the carver (recovery tool) also do recovery using file allocation data or remainders of file allocation data.
+
 
+
[[User:Joachim Metz|Joachim]]
+
I would go as far to ask you all to look beyond the carver as a tool and look from the perspective of the carver as part of the forensic investigation process. In my eyes certain information needed/acquired by the carver could be also very useful investigative information i.e. what part of a hard disk contains empty sectors.
+
 
+
=Supportive tooling=
+
[[User:Joachim Metz|Joachim]]
+
* validator (definitions) tester (detest in revit07)
+
* tool to make configuration based definitions
+
* post carving validation
+
* the carver needs to provide support for fuse mount of carved files (carvfs)
+
 
+
=Testing =
+
[[User:Joachim Metz|Joachim]]
+
* automated testing
+
* test data
+
 
+
=Validator Construction=
+
Options:
+
* Write validators in C/C++
+
** [[User:Joachim Metz|Joachim]] you mean dedicated validators
+
* Have a scripting language for writing them (python? Perl?) our own?
+
** [[User:Joachim Metz|Joachim]] use easy to embed programming languages i.e. Phyton or Lua
+
* Use existing programs (libjpeg?) as plug-in validators?
+
** [[User:Joachim Metz|Joachim]] define a file structure api for this
+
 
+
=Existing Code that we have=
+
[[User:Joachim Metz|Joachim]]
+
Please add any missing links
+
 
+
Documentation/Articles
+
* DFRWS2006/2007 carving challenge results
+
* DFRWS2008 paper on carving
+
 
+
Carvers
+
* DFRWS2006/2007 carving challenge results
+
* photorec (http://www.cgsecurity.org/wiki/PhotoRec)
+
* revit06 and revit07 (http://sourceforge.net/projects/revit/)
+
* s3/scarve
+
 
+
Possible file structure validator libraries
+
* divers existing file support libraries
+
* libole2 (inhouse experimental code of OLE2 support)
+
* libpff (alpha release for PFF (PST/OST) file support) (http://sourceforge.net/projects/libpff/)
+
 
+
Input support
+
* AFF (http://www.afflib.org/)
+
* EWF (http://sourceforge.net/projects/libewf/)
+
* TSK device & raw & split raw (http://www.sleuthkit.org/)
+
 
+
Volume/Partition support
+
* disktype (http://disktype.sourceforge.net/)
+
* testdisk (http://www.cgsecurity.org/wiki/TestDisk)
+
* TSK
+
 
+
File system support
+
* TSK
+
* photorec FS code
+
* implementations of FS in Linux/BSD
+
 
+
Content support
+
 
+
Zero storage support
+
* libcarvpath
+
* carvfs
+
 
+
POLA
+
* joe-e (java)
+
* Emily (ocaml)
+
* the E language
+
* AppArmor
+
* iptables/ipfw
+
* minorfs
+
* plash
+
 
+
=Implementation Timeline=
+
# gather the available resources/ideas/wishes/needs etc. (I guess we're in this phase)
+
# start discussing a high level design (in terms of algorithm, facilities, information needed)
+
## input formats facility
+
## partition/volume facility
+
## file system facility
+
## file format facility
+
## content facility
+
## how to deal with fragment detection (do the validators allow for fragment detection?)
+
## how to deal with recombination of fragments
+
## do we want multiple carving phases in light of speed/precision tradeoffs
+
# start detailing parts of the design
+
## Discuss options for a grammar driven validator?
+
## Hard-coded plug-ins?
+
## Which existing code can we use?
+
# start building/assembling parts of the tooling for a prototype
+
## Implement simple file carving with validation.
+
## Implement gap carving
+
# Initial Release
+
# Implement the ''threaded carving'' that [[User:.FUF|.FUF]] is describing above.
+
 
+
[[User:Joachim Metz|Joachim]] Shouldn't multi threaded carving (MTC) not be part of the 1st version?
+
The MT approach makes for different design decisions
+

Revision as of 20:24, 20 April 2009

Many computer forensic programs, especially the all-in-one suites, use their own file formats to store information.

Independent File Formats

These file formats were developed independently of any specific forensics package.

AFF

Full details of the format and a working implementation can be downloaded from http://www.afflib.org/

AFF4

AFF4 is a complete redesign of the AFF format. AFF4 is geared towards very large corpuses of images. It features a choice of binary container formats such as Zip, Zip64 and simple directories. Storage can be done using regular HTTP, as well as imaging directly to a central HTTP server using webdav. The format includes support for maps - which are zero copy transformations of data - for example, instead of storing a whole new copy of a carved file we just store a map of the blocks allocated to this file. This makes it trivial to chop up an image in many different ways with no storage overheads (for example chop up a memory image into the different process address spaces, extract TCP streams from a PCAP file with no copying overheads or extract all files from a filesystem with no copying). AFF4 also supports cryptography and image signing. AFF4 support fuse to present the images transparently to clients.


gfzip (generic forensic zip) file format

Gfzip aims to provide an open file format for 'forensic complete' 'compressed' and 'signed' disk image data files. Uncompressed disk images can be used the same way dd images are, as gfzip uses a data first footer last design. Gfzip uses multi level SHA256 digest based integrity guards instead of SHA1 or the deprecated MD5 algoritm. User supplied meta data is embedded in a meta data section within the file. A very important feature that gfzip focuses on extensively is the use of signed data and meta data sections using x509 certificates.

Program-Specific File Formats

These file formats were developed for use with a specific forensics program. Sometimes they can be used with other programs whose authors have specifically reverse-engineered the software. Other times they cannot.

Encase image file format

Perhaps the de facto standard for forensic analyses in law enforcement, Guidance Software's EnCase Forensic uses a closed format for images. This format is heavily based on ASR Data's Expert Witness Compression Format. EnCase's Evidence File (.E01) format contains a physical bitstream of an acquired disk, prefixed with a "Case Info" header, interlaced with CRCs for every block of 64 sectors (32 KB), and followed by a footer containing an MD5 hash for the entire bitstream. Contained in the header are the date and time of acquisition, an examiner's name, notes on the acquisition, and an optional password; the header concludes with its own CRC.

Not only is the format is compressible, it is also searchable. Compression is block-based, and jump tables and "file pointers" are maintained in the format's header or between blocks "to enhance speed". Disk images can be split into multiple segment files (e.g., for archival to CD or DVD).

Up to version 5 of EnCase the segment files could be no larger than 2 GB. This restriction has been removed using a work around the 31-bit offset values in version 6 of EnCase.

The format restricts the type and quantity of metadata that can be associated with an image. Extended EWF (EWF-X) defined by the libewf project provides a work around for this restriction specifying a new header and (digest) hash section using XML string to store the metadata. These EWF-X E01 files are compatible with EnCase and allow to store more metadata.

Though some have reverse-engineered the format for compatibility's sake, Guidances extensions to the format remains closed.

ILook Investigator's IDIF, IRBF, and IEIF Formats

ILook Investigator v8 and its disk-imaging counterpart, IXimager, offer three proprietary, authenticated image formats: compressed (IDIF), non-compressed (IRBF), and encrypted (IEIF). Although few technical details are disclosed publicly, IXimager's online documentation provides some insights: IDIF "includes protective mechanisms to detect changes from the source image entity to the output form" and supports "logging of user actions within the confines of that event;" IRBF is similar to IDIF except that disk images are left uncompressed; IEIF, meanwhile, encrypts said images.

For compatibility with ILook Investigator v7 and other forensic tools, IXimager allows for the transformation of each of these formats into raw format.

ProDiscover Family's ProDiscover image file format

Used by Technology Pathways ProDiscover Family of security tools, the ProDiscover Image File format consists of five parts: a 16-byte Image File Header, which includes a signature and version number for an image; a 681-byte Image Data Header, which contains user-provided metadata about the image; Image Data, which comprises a single block of uncompressed data or an array of blocks of compressed data; an Array of Compressed Blocks sizes (if the Image Data is, in fact, compressed); and I/O Log Errors describing any problems during the image's acquisition.

Though fairly well documented, the format is not extensible.

PyFlag's sgzip Format

Supported by PyFlag, a "Forensic and Log Analysis GUI" begun as a project in the Australian Department of Defence, sgzip is a seekable variant of the gzip format. By compressing blocks (of 32KB, by default) individually, sgzip allows disk images to be searched for keywords without being fully decompressed. The format does not associate metadata with images. In addition to its own sgzip format, PyFlag can also read and write the Expert Witness Compression Format and AFF.

Rapid Action Imaging Device (RAID)'s Format

Though relatively little technical detail is publicly available, DIBS USA's Rapid Action Imaging Device (RAID) offers "built in [sic] integrity checking" and is to be designed to create an identical copy in raw format of one disk on another. The copy can then "be inserted into a forensic workstation".

Safeback's Format

SafeBack, a DOS-based utility designed to create exact copies of entire disks or partitions, offers a "self-authenticating" format for images, whereby SHA256 hashes are stored along with data to ensure the latter's integrity. Although few technical details are disclosed publicly, SafeBack's authors claim that the software "safeguards the internally stored SHA256 values".

SDi32's Format

Imaging software designed to be used with write-blocking hardware, Vogon International's SDi32 is capable of making identical copies of disks to tape, disk, or file, with optional CRC32 and MD5 fingerprints. The copies are stored in raw format.

SMART's Formats

SMART, a software utility for Linux designed by the original authors of Expert Witness (now sold under the name of EnCase), can store disk images as pure bitstreams (compressed or uncompressed) and also in ASR Data's Expert Witness Compression Format. Images stored in the latter format can be stored as a single file or in multiple segment files, each of which consist of a standard 13-byte header followed by a series of sections, each of type "header", "volume", "table", "next", or "done". Each section includes its type string, a 64-bit offset to the next section, its 64-bit size, padding, and a CRC, in addition to actual data or comments, if applicable. Although the format's "header" section supports free-form notes, an image can have only one such section (in its first segment file only).

Programs with no specific file format

Several programs can read multiple file formats, but do not have their own proprietary formats.