Difference between revisions of "Word Document (DOC)"

From ForensicsWiki
Jump to: navigation, search
m (DOC moved to Word (DOC): There are multiple files with the doc extension)
Line 20: Line 20:
 
== File Header ==
 
== File Header ==
  
MS Word documents of version 97 (and probably earlier) begin with the file signature (in hexadecimal) d0cf11e0a1b11ae1 .
+
[[Microsoft Word]] documents of version 97-2003 use the [[OLE Compound File]] (OLECF).
This signature signifies the file to be an OLE Compound File (AKA Compound Document File or Compound Binary File)
+
  
The OLE Compound File has no distinct footer and a can be considered a file containing a FAT like file system.
+
The Word Binary File format is stored in the OLECF.
  
The Word document format is places on top of the OLE Compound File.
+
The object stream of the OLECF containing a Word document contains the string "Word.Document" with some version.
 
+
The object stream of a word documents contains the string "Word.Document" with some version.
+
  
 
== Encryption ==
 
== Encryption ==
Line 33: Line 30:
 
Versions 97/2000 encrypt documents with a very weak algorithm. This password scheme can be broken easily by several different products and it is possible to decrypt the contents without discovering the password. This is done by testing all 1,099,511,627,776 possible keys. Ultimate Zip Cracker by VDGSoftware is one utility that can perform this decryption.
 
Versions 97/2000 encrypt documents with a very weak algorithm. This password scheme can be broken easily by several different products and it is possible to decrypt the contents without discovering the password. This is done by testing all 1,099,511,627,776 possible keys. Ultimate Zip Cracker by VDGSoftware is one utility that can perform this decryption.
 
== See Also==
 
== See Also==
[[Media:Compdocfileformat.pdf|Microsoft Compound Document File Format]] (This is actually the OpenOffice specification)
 
 
[http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/WindowsCompoundBinaryFileFormatSpecification.pdf Compound Binary File Specification by Microsoft]
 
  
Be warned this file contains at least one error: the directory entry name length is a size in bytes not in characters.
+
[http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/Word97-2007BinaryFileFormat(doc)Specification.pdf Word 97-2007 Binary File Format by Microsoft]
  
 
== Extracting Strings ==
 
== Extracting Strings ==

Revision as of 05:25, 31 January 2009

The DOC file format (document file format) usually has the .doc extension. Mostly these documents belong to Microsoft Word software files. However, other text editing software can be used to display these files (including WordPad, WordPerfect, OpenOffice and others).

The DOC file format should not be confused with DOCX.

MIME types

The following MIME types apply to this file format:

  • application/msword
  • application/doc
  • appl/text
  • application/vnd.msword
  • application/vnd.ms-word
  • application/winword
  • application/word
  • application/x-msw6
  • application/x-msword
  • zz-application/zz-winassoc-doc

File Header

Microsoft Word documents of version 97-2003 use the OLE Compound File (OLECF).

The Word Binary File format is stored in the OLECF.

The object stream of the OLECF containing a Word document contains the string "Word.Document" with some version.

Encryption

Versions 97/2000 encrypt documents with a very weak algorithm. This password scheme can be broken easily by several different products and it is possible to decrypt the contents without discovering the password. This is done by testing all 1,099,511,627,776 possible keys. Ultimate Zip Cracker by VDGSoftware is one utility that can perform this decryption.

See Also

Word 97-2007 Binary File Format by Microsoft

Extracting Strings

On a unix-like machine try this command to extract strings from a .doc file:

cat /tmp/test.doc | tr -d \\0 | strings | more

(where /tmp/test.doc is the path to your .doc file)