Difference between pages "Microsoft Office File formats" and "Category:Forensics File Formats"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m (See Also)
 
m
 
Line 1: Line 1:
==See Also==
+
Many computer forensic programs, especially the all-in-one suites, use their own file formats to store information.  
*[[Tools:Document Metadata Extraction]]
+
*[[Media:Compdocfileformat.pdf Microsoft Compound Document File Format]]
+
  
==External Links==
+
; [[AFF]]
===Microsoft.com links===
+
Full details of the format and a working implementation can be downloaded from http://www.afflib.org/
* [http://msdn.microsoft.com/en-us/library/aa338205.aspx Introducing the Office (2007) Open XML File Formats]
+
* [http://msdn.microsoft.com/en-us/library/cc313105.aspx Microsoft Office Binary File Format Documents]
+
* [http://www.microsoft.com/interop/docs/OfficeBinaryFormats.mspx Microsoft Office Binary (doc, xls, ppt) File Formats]
+
* [http://office.microsoft.com/en-us/products/ha102058151033.aspx Ecma Office Open XML File Formats overview]
+
* [http://office.microsoft.com/en-us/help/HA100069351033.aspx Introduction to new file name extensions and Open XML Formats]
+
===Evaluations===
+
* [http://www.joelonsoftware.com/items/2008/02/19.html Why are the Microsoft Office file formats so complicated? (And some workarounds)]
+
  
===Wikipedia===
+
; [[AFF4]]
* [http://en.wikipedia.org/wiki/Microsoft_Word Wikipedia article on Microsoft Word]
+
AFF4 is a complete redesign of the AFF format. AFF4 is geared towards very large corpuses of images. It features a choice of binary container formats such as Zip, Zip64 and simple directories. Storage can be done using regular HTTP, as well as imaging directly to a central HTTP server using webdav. The format includes support for maps - which are zero copy transformations of data - for example, instead of storing a whole new copy of a carved file we just store a map of the blocks allocated to this file. This makes it trivial to chop up an image in many different ways with no storage overheads (for example chop up a memory image into the different process address spaces, extract TCP streams from a PCAP file with no copying overheads or extract all files from a filesystem with no copying). AFF4 also supports cryptography and image signing. AFF4 support fuse to present the images transparently to clients.
* [http://en.wikipedia.org/wiki/Object_Linking_and_Embedding Wikipedia article on OLE]
+
 
 +
; [[EnCase]]
 +
Perhaps the de facto standard for forensic analyses in law
 +
enforcement, Guidance Software's [[EnCase]] Forensic uses
 +
a closed format for images. This format is heavily based on ASR Data's
 +
Expert Witness Compression Format. EnCase's Evidence File
 +
(.E01) format contains a physical bitstream
 +
of an acquired disk, prefixed with a "Case Info" header,
 +
interlaced with CRCs for every block of 64 sectors (32 KB), and
 +
followed by a footer containing an [[MD5]] hash for the entire
 +
bitstream.  Contained in the header are the date and time of
 +
acquisition, an examiner's name, notes on the acquisition, and an
 +
optional password; the header concludes with its own CRC.
 +
 
 +
Not only is the format is compressible, it is also searchable.
 +
Compression is block-based, and jump tables and "file pointers" are maintained in the format's header or
 +
between blocks "to enhance speed".  Disk images
 +
can be split into multiple segment files (e.g., for archival to CD or
 +
DVD).
 +
 
 +
Up to version 5 of [[EnCase]] the segment files could be no larger than 2 GB. This restriction has been removed using a work around the 31-bit offset values in version 6 of EnCase.
 +
 
 +
The format restricts the type and quantity of metadata that can be associated with an image. Extended EWF (EWF-X) defined by the libewf project provides a work around for this restriction specifying a new header and (digest) hash section using XML string to store the metadata. These EWF-X E01 files are compatible with EnCase and allow to store more metadata.
 +
 
 +
Though some have reverse-engineered the format for compatibility's sake, Guidances extensions to the format remains closed.
 +
 
 +
; [[FTK Imager]] ([[FTK]]'s) File Formats
 +
 
 +
A popular alternative to [[EnCase]], AccessData's Forensic Toolkit ([[FTK]])
 +
supports storage of disk images in EnCase's or [[SMART]]'s file format,
 +
as well as in raw ([[dd]]) format.  With Isobuster technology built in, [[FTK Imager]] Images CD's to a ISO/CUE file combination.  This also includes multi and open session CDs.
 +
 
 +
; [[gfzip]] (generic forensic zip) file format
 +
 
 +
Gfzip aims to provide an open file format for 'forensic complete' 'compressed' and 'signed' disk image data files.
 +
Uncompressed disk images can be used the same way [[dd]] images are, as gfzip uses a data first footer last design.
 +
Gfzip uses multi level [[SHA256]] digest based integrity guards instead of [[SHA1]] or the deprecated [[MD5]] algoritm.
 +
User supplied meta data is embedded in a meta data section within the file.
 +
A very important feature that gfzip focuses on extensively is the use of signed data and meta data sections using x509 certificates.
 +
 
 +
; [[ILook Investigator]]'s IDIF, IRBF, and IEIF Formats
 +
 
 +
ILook Investigator v8 and its disk-imaging
 +
counterpart, [[IXimager]], offer three proprietary, authenticated image
 +
formats: compressed (IDIF), non-compressed (IRBF), and encrypted
 +
(IEIF). Although few technical details are disclosed publicly,
 +
IXimager's online documentation provides some
 +
insights: IDIF "includes protective mechanisms to detect changes
 +
from the source image entity to the output form" and supports
 +
"logging of user actions within the confines of that event;"  IRBF
 +
is similar to IDIF except that disk images are left uncompressed;
 +
IEIF, meanwhile, encrypts said images.
 +
 
 +
For compatibility with ILook Investigator v7 and other forensic
 +
tools, IXimager allows for the transformation of each of these
 +
formats into raw format.
 +
 
 +
; [[ProDiscover]] Family's ProDiscover Image File Format
 +
 
 +
Used by [[Technology Pathways]] [[ProDiscover]] Family of security tools, the ProDiscover Image File format consists of five parts: a 16-byte Image File Header, which includes a signature and version number for an
 +
image; a 681-byte Image Data Header, which contains user-provided
 +
metadata about the image; Image Data, which comprises a single block
 +
of uncompressed data or an array of blocks of compressed data; an
 +
Array of Compressed Blocks sizes (if the Image Data is, in fact,
 +
compressed); and I/O Log Errors describing any problems during the
 +
image's acquisition.
 +
 
 +
Though fairly well documented, the format is not extensible.
 +
 
 +
; [[PyFlag]]'s [[sgzip]] Format
 +
 
 +
Supported by [[PyFlag]], a "Forensic and Log
 +
Analysis GUI" begun as a project in the Australian Department of
 +
Defence, sgzip is a seekable variant of the gzip format.  By
 +
compressing blocks (of 32KB, by default) individually, sgzip allows
 +
disk images to be searched for keywords without being fully
 +
decompressed.  The format does not associate metadata with images. In addition to its own sgzip format, PyFlag can also read and write the Expert Witness Compression Format.
 +
 
 +
; [[Rapid Action Imaging Device]] (RAID)'s Format
 +
 
 +
Though relatively little technical detail is publicly available, DIBS USA's
 +
Rapid Action Imaging Device (RAID) offers "built in
 +
[sic] integrity checking" and is to be designed to
 +
create an identical copy in raw format of one disk on another.  The copy can then
 +
"be inserted into a forensic workstation".
 +
 
 +
; [[Safeback]]'s Format
 +
 
 +
SafeBack, a DOS-based utility designed to create
 +
exact copies of entire disks or partitions, offers a
 +
"self-authenticating" format for images, whereby [[SHA256]] hashes are
 +
stored along with data to ensure the latter's integrity.  Although
 +
few technical details are disclosed publicly, SafeBack's authors
 +
claim that the software "safeguards the internally stored SHA256
 +
values".
 +
 
 +
; [[SDi32]]'s Format
 +
 
 +
Imaging software designed to be used with write-blocking hardware,
 +
Vogon International's SDi32 is capable of making identical copies
 +
of disks to tape, disk, or file, with optional CRC32 and [[MD5]]
 +
fingerprints.  The copies are stored in raw format.
 +
 
 +
 
 +
; [[SMART]]'s Formats
 +
 
 +
[[SMART]], a software utility for Linux designed by the
 +
original authors of Expert Witness (now sold under the name of
 +
EnCase), can store disk images as pure bitstreams
 +
(compressed or uncompressed) and also in ASR Data's [[Expert Witness]]
 +
Compression Format.  Images stored in the latter format
 +
can be stored as a single file or in multiple segment files, each of
 +
which consist of a standard 13-byte header followed by a series of
 +
sections, each of type "header", "volume", "table", "next",
 +
or "done". Each section includes its type string, a 64-bit offset
 +
to the next section, its 64-bit size, padding, and a CRC, in
 +
addition to actual data or comments, if applicable. Although the
 +
format's "header" section supports free-form notes, an image can
 +
have only one such section (in its first segment file only).

Revision as of 20:13, 20 April 2009

Many computer forensic programs, especially the all-in-one suites, use their own file formats to store information.

AFF

Full details of the format and a working implementation can be downloaded from http://www.afflib.org/

AFF4

AFF4 is a complete redesign of the AFF format. AFF4 is geared towards very large corpuses of images. It features a choice of binary container formats such as Zip, Zip64 and simple directories. Storage can be done using regular HTTP, as well as imaging directly to a central HTTP server using webdav. The format includes support for maps - which are zero copy transformations of data - for example, instead of storing a whole new copy of a carved file we just store a map of the blocks allocated to this file. This makes it trivial to chop up an image in many different ways with no storage overheads (for example chop up a memory image into the different process address spaces, extract TCP streams from a PCAP file with no copying overheads or extract all files from a filesystem with no copying). AFF4 also supports cryptography and image signing. AFF4 support fuse to present the images transparently to clients.

EnCase

Perhaps the de facto standard for forensic analyses in law enforcement, Guidance Software's EnCase Forensic uses a closed format for images. This format is heavily based on ASR Data's Expert Witness Compression Format. EnCase's Evidence File (.E01) format contains a physical bitstream of an acquired disk, prefixed with a "Case Info" header, interlaced with CRCs for every block of 64 sectors (32 KB), and followed by a footer containing an MD5 hash for the entire bitstream. Contained in the header are the date and time of acquisition, an examiner's name, notes on the acquisition, and an optional password; the header concludes with its own CRC.

Not only is the format is compressible, it is also searchable. Compression is block-based, and jump tables and "file pointers" are maintained in the format's header or between blocks "to enhance speed". Disk images can be split into multiple segment files (e.g., for archival to CD or DVD).

Up to version 5 of EnCase the segment files could be no larger than 2 GB. This restriction has been removed using a work around the 31-bit offset values in version 6 of EnCase.

The format restricts the type and quantity of metadata that can be associated with an image. Extended EWF (EWF-X) defined by the libewf project provides a work around for this restriction specifying a new header and (digest) hash section using XML string to store the metadata. These EWF-X E01 files are compatible with EnCase and allow to store more metadata.

Though some have reverse-engineered the format for compatibility's sake, Guidances extensions to the format remains closed.

FTK Imager (FTK's) File Formats

A popular alternative to EnCase, AccessData's Forensic Toolkit (FTK) supports storage of disk images in EnCase's or SMART's file format, as well as in raw (dd) format. With Isobuster technology built in, FTK Imager Images CD's to a ISO/CUE file combination. This also includes multi and open session CDs.

gfzip (generic forensic zip) file format

Gfzip aims to provide an open file format for 'forensic complete' 'compressed' and 'signed' disk image data files. Uncompressed disk images can be used the same way dd images are, as gfzip uses a data first footer last design. Gfzip uses multi level SHA256 digest based integrity guards instead of SHA1 or the deprecated MD5 algoritm. User supplied meta data is embedded in a meta data section within the file. A very important feature that gfzip focuses on extensively is the use of signed data and meta data sections using x509 certificates.

ILook Investigator's IDIF, IRBF, and IEIF Formats

ILook Investigator v8 and its disk-imaging counterpart, IXimager, offer three proprietary, authenticated image formats: compressed (IDIF), non-compressed (IRBF), and encrypted (IEIF). Although few technical details are disclosed publicly, IXimager's online documentation provides some insights: IDIF "includes protective mechanisms to detect changes from the source image entity to the output form" and supports "logging of user actions within the confines of that event;" IRBF is similar to IDIF except that disk images are left uncompressed; IEIF, meanwhile, encrypts said images.

For compatibility with ILook Investigator v7 and other forensic tools, IXimager allows for the transformation of each of these formats into raw format.

ProDiscover Family's ProDiscover Image File Format

Used by Technology Pathways ProDiscover Family of security tools, the ProDiscover Image File format consists of five parts: a 16-byte Image File Header, which includes a signature and version number for an image; a 681-byte Image Data Header, which contains user-provided metadata about the image; Image Data, which comprises a single block of uncompressed data or an array of blocks of compressed data; an Array of Compressed Blocks sizes (if the Image Data is, in fact, compressed); and I/O Log Errors describing any problems during the image's acquisition.

Though fairly well documented, the format is not extensible.

PyFlag's sgzip Format

Supported by PyFlag, a "Forensic and Log Analysis GUI" begun as a project in the Australian Department of Defence, sgzip is a seekable variant of the gzip format. By compressing blocks (of 32KB, by default) individually, sgzip allows disk images to be searched for keywords without being fully decompressed. The format does not associate metadata with images. In addition to its own sgzip format, PyFlag can also read and write the Expert Witness Compression Format.

Rapid Action Imaging Device (RAID)'s Format

Though relatively little technical detail is publicly available, DIBS USA's Rapid Action Imaging Device (RAID) offers "built in [sic] integrity checking" and is to be designed to create an identical copy in raw format of one disk on another. The copy can then "be inserted into a forensic workstation".

Safeback's Format

SafeBack, a DOS-based utility designed to create exact copies of entire disks or partitions, offers a "self-authenticating" format for images, whereby SHA256 hashes are stored along with data to ensure the latter's integrity. Although few technical details are disclosed publicly, SafeBack's authors claim that the software "safeguards the internally stored SHA256 values".

SDi32's Format

Imaging software designed to be used with write-blocking hardware, Vogon International's SDi32 is capable of making identical copies of disks to tape, disk, or file, with optional CRC32 and MD5 fingerprints. The copies are stored in raw format.


SMART's Formats

SMART, a software utility for Linux designed by the original authors of Expert Witness (now sold under the name of EnCase), can store disk images as pure bitstreams (compressed or uncompressed) and also in ASR Data's Expert Witness Compression Format. Images stored in the latter format can be stored as a single file or in multiple segment files, each of which consist of a standard 13-byte header followed by a series of sections, each of type "header", "volume", "table", "next", or "done". Each section includes its type string, a 64-bit offset to the next section, its 64-bit size, padding, and a CRC, in addition to actual data or comments, if applicable. Although the format's "header" section supports free-form notes, an image can have only one such section (in its first segment file only).