Difference between pages "Word Document (DOC)" and "Encase image file format"

From Forensics Wiki
(Difference between pages)
Jump to: navigation, search
(Extracting Strings)
 
 
Line 1: Line 1:
The '''Word Document (DOC) file format''' has the '''.doc''' extension. This file type originates from [[Microsoft Word]]. However, other word processing software can be used to display these files as well. These include:
+
[[EnCase]] uses a closed format for images which is reportedly based on [http://www.asrdata.com/SMART/whitepaper.html ASR Data's Expert Witness Compression Format]. The evidence files, or E01 files, contain a physical bitstream of an acquired disk, prefixed with a '"Case Info" header, interlaced with CRCs for every block of 64 sectors~(32 KB), and followed by a footer containing an MD5 hash for the entire bitstream. Contained in the header are the date and time of acquisition, an examiner's name, notes on the acquisition, and an optional password; the header concludes with its own CRC.
* [[WordPad]]
+
* [[WordPerfect]]
+
* [[OpenOffice]]
+
* [[AbiWord]]
+
  
The Word DOC file format should not be confused with [[DOCX]].
+
EnCase can store media data into multiple evidence files, which are called segment files. Each segment file consist of multiple sections. Each section consist of a section start definition. This contains a section type.
  
== MIME types ==
+
Up to EnCase 5 the segment file were limited to 2 GiB, due to the internal 31-bit file offset representation. This limitation was lifted using a base offset work around in EnCase 6.
  
The following [[MIME types]] apply to this [[file format]]:
+
At least from Encase 3 the case info header is contained in the "header" section, which is defined twice within the file and contain the same information.
  
* application/msword
+
With Encase 4 an additional "header2" section was added. The "header" section now appears only once, but the new "header2" section twice.
* application/doc
+
* appl/text
+
* application/vnd.msword
+
* application/vnd.ms-word
+
* application/winword
+
* application/word
+
* application/x-msw6
+
* application/x-msword
+
* zz-application/zz-winassoc-doc
+
  
== File signature ==
+
Version 3 of The Encase F introduced an "error2" sections that it uses to record the location and number of bad sector chunks. The way it handles the sections it can't read is that those areas are filled with zero. Then Encase displays to the user the areas that could not be read when the image was acquired. The granularity of unreadable chunks appears to be 32K.
  
[[Microsoft Word]] documents of version 97-2003 use the [[OLE Compound File]] (OLECF). These files therefore have the OLECF file signature
+
Within Encase 5 the amount of sectors per block (chunk) can vary.
  
The object stream of the OLECF containing a Word document contains the string "Word.Document" with some version.
+
Encase from at least in version 3, 4 and 5 can hash the data of the media it acquires.
 +
It does this by calculating a MD5 hash of the original media data and adds a hash section
 +
to the last of the segment files.
  
== Word 97-2003 documents ==
+
== See Also ==
  
The Word Binary File format is stored in the OLECF using multiple streams:
+
[[EnCase]]
* WordDocument stream
+
* Table stream (0Table, 1Table)
+
* Data stream
+
  
== Encryption ==
+
== External Links ==  
  
Versions 97/2000 encrypt documents with a very weak algorithm. This password scheme can be broken easily by several different products and it is possible to decrypt the contents without discovering the password. This is done by testing all 1,099,511,627,776 possible keys. Ultimate Zip Cracker by VDGSoftware is one utility that can perform this decryption.
+
* A great deal of information about the format has been documented by the [http://libewf.sourceforge.net libewf project], including some of the [http://downloads.sourceforge.net/libewf/ewf_file_format.pdf E01 file format specifications].
== See Also==
+
* [http://www.cfreds.nist.gov/v2/Basic_Mac_Image.html Sample image in EnCase, iLook, and dd format] - From the [[Computer Forensic Reference Data Sets]] Project
  
[http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/Word97-2007BinaryFileFormat(doc)Specification.pdf Word 97-2007 Binary File Format by Microsoft]
+
[[Category:Forensics File Format]]
 
+
== Extracting Strings ==
+
 
+
On a unix-like machine try this command to extract strings from a .doc file:
+
 
+
<code>
+
cat /tmp/test.doc | tr -d \\0  | strings | more
+
</code>
+
 
+
(where /tmp/test.doc is the path to your .doc file)
+
 
+
Note that a Word 97 and later document can contain both extended ASCII with codepage 1252 and UTF-16 little-endian text. So using basic Unix string is not worth much. Use the sleuthkit strings or EnCase instead.
+
 
+
[[Category:File Formats]]
+

Revision as of 05:18, 31 January 2009

EnCase uses a closed format for images which is reportedly based on ASR Data's Expert Witness Compression Format. The evidence files, or E01 files, contain a physical bitstream of an acquired disk, prefixed with a '"Case Info" header, interlaced with CRCs for every block of 64 sectors~(32 KB), and followed by a footer containing an MD5 hash for the entire bitstream. Contained in the header are the date and time of acquisition, an examiner's name, notes on the acquisition, and an optional password; the header concludes with its own CRC.

EnCase can store media data into multiple evidence files, which are called segment files. Each segment file consist of multiple sections. Each section consist of a section start definition. This contains a section type.

Up to EnCase 5 the segment file were limited to 2 GiB, due to the internal 31-bit file offset representation. This limitation was lifted using a base offset work around in EnCase 6.

At least from Encase 3 the case info header is contained in the "header" section, which is defined twice within the file and contain the same information.

With Encase 4 an additional "header2" section was added. The "header" section now appears only once, but the new "header2" section twice.

Version 3 of The Encase F introduced an "error2" sections that it uses to record the location and number of bad sector chunks. The way it handles the sections it can't read is that those areas are filled with zero. Then Encase displays to the user the areas that could not be read when the image was acquired. The granularity of unreadable chunks appears to be 32K.

Within Encase 5 the amount of sectors per block (chunk) can vary.

Encase from at least in version 3, 4 and 5 can hash the data of the media it acquires. It does this by calculating a MD5 hash of the original media data and adds a hash section to the last of the segment files.

See Also

EnCase

External Links