Difference between pages "Word Document (DOC)" and "HTML"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
 
 
Line 1: Line 1:
The '''DOC file format''' ('''document file format''') usually has the '''.doc''' extension. Mostly these documents belong to [[Microsoft]] [[Word]] software files. However, other text editing software can be used to display these files (including [[WordPad]], [[WordPerfect]], [[OpenOffice]] and others).
+
The '''Hypertext Markup Language''' ('''HTML''') [[file format]] is used to create/display web pages.
  
The DOC file format should not be confused with [[DOCX]].
+
Its main purpose is to align text, images, or links on a website in a specific way. Web pages with '''.html''' or '''.htm''' extensions are examples of static web site files. Any server or database technologies require another language on top of HTML to create dynamic features in a web site. HTML files are mere [[TXT|plain text files]] whose contents follow certain rules.
  
== MIME types ==
+
HTML files are usually viewed using a [[Web Browser|web browser]], can also be opened with a variety of other programs.
  
The following [[MIME types]] apply to this [[file format]]:
+
== XHTML ==
  
* application/msword
+
The '''Extensive Hypertext Markup Language''' ('''XHTML''') is similar in nature to HTML, but has a stricter [[XML]]-based syntax.  
* application/doc
+
* appl/text
+
* application/vnd.msword
+
* application/vnd.ms-word
+
* application/winword
+
* application/word
+
* application/x-msw6
+
* application/x-msword
+
* zz-application/zz-winassoc-doc
+
  
== File Header ==
+
== External Links ==
  
[[Microsoft Word]] documents of version 97-2003 use the [[OLE Compound File]] (OLECF).
+
* [http://en.wikipedia.org/wiki/Html Wikipedia: HTML]
 
+
* [http://en.wikipedia.org/wiki/Xhtml Wikipedia: XHTML]
The Word Binary File format is stored in the OLECF.
+
* [http://www.w3.org/TR/html401/ HTML 4.01 Specification]
 
+
* [http://www.w3.org/TR/xhtml11/ XHTML 1.1 Specification]
The object stream of the OLECF containing a Word document contains the string "Word.Document" with some version.
+
 
+
== Encryption ==
+
 
+
Versions 97/2000 encrypt documents with a very weak algorithm. This password scheme can be broken easily by several different products and it is possible to decrypt the contents without discovering the password. This is done by testing all 1,099,511,627,776 possible keys. Ultimate Zip Cracker by VDGSoftware is one utility that can perform this decryption.
+
== See Also==
+
 
+
[http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/Word97-2007BinaryFileFormat(doc)Specification.pdf Word 97-2007 Binary File Format by Microsoft]
+
 
+
== Extracting Strings ==
+
 
+
On a unix-like machine try this command to extract strings from a .doc file:
+
 
+
<code>
+
cat /tmp/test.doc | tr -d \\0  | strings | more
+
</code>
+
 
+
(where /tmp/test.doc is the path to your .doc file)
+
  
 
[[Category:File Formats]]
 
[[Category:File Formats]]

Revision as of 20:29, 10 March 2007

The Hypertext Markup Language (HTML) file format is used to create/display web pages.

Its main purpose is to align text, images, or links on a website in a specific way. Web pages with .html or .htm extensions are examples of static web site files. Any server or database technologies require another language on top of HTML to create dynamic features in a web site. HTML files are mere plain text files whose contents follow certain rules.

HTML files are usually viewed using a web browser, can also be opened with a variety of other programs.

XHTML

The Extensive Hypertext Markup Language (XHTML) is similar in nature to HTML, but has a stricter XML-based syntax.

External Links