Difference between pages "Internet Explorer History File Format" and "Document Metadata Extraction"

From Forensics Wiki
(Difference between pages)
Jump to: navigation, search
(File Header)
 
m (Office Files)
 
Line 1: Line 1:
{{Expand}}
+
Here are tools that will extract metadata from document files.
[[Internet Explorer]] as of version 4 up to version 9 stores the web browsing history in files named <tt>index.dat</tt>. The files contain multiple records.
+
MSIE version 3 probably uses similar records in its History (Cache) files.
+
  
== File Locations ==
+
=Office Files=
  
Internet Explorer history files keep a record of URLs that the browser has visited, cookies that were created by these sites, and any temporary internet files that were downloaded by the site visit. As a result, Internet Explorer history files are kept in several locations. Regardless of the information stored in the file, the file is named index.dat.
+
; [[antiword]]
 +
: http://www.winfield.demon.nl/
  
On Windows 95/98 these files were located in the following locations:
+
; [[catdoc]]
<pre>
+
: http://www.45.free.net/~vitus/software/catdoc/
%systemdir%\Temporary Internet Files\Content.ie5
+
%systemdir%\Cookies
+
%systemdir%\History\History.ie5
+
</pre>
+
  
On Windows 2000/XP the file locations have changed:
+
; [[laola]]
<pre>
+
: http://user.cs.tu-berlin.de/~schwartz/pmh/index.html
%systemdir%\Documents and Settings\%username%\Local Settings\Temporary Internet Files\Content.ie5
+
%systemdir%\Documents and Settings\%username%\Cookies
+
%systemdir%\Documents and Settings\%username%\Local Settings\History\history.ie5
+
</pre>
+
  
On Windows Vista/7
+
; [[word2x]]
<pre>
+
: http://word2x.sourceforge.net/
%systemdir%\Users\%username%\AppData\Local\Microsoft\Windows\Temporary Internet Files\
+
%systemdir%\Users\%username%\AppData\Local\Microsoft\Windows\Temporary Internet Files\Low\
+
</pre>
+
  
Internet Explorer also keeps daily, weekly, and monthly history logs that will be located in subfolders of %systemdir%\Documents and Settings\%username%\Local Settings\History\history.ie5. The folders will be named <tt>MSHist<two-digit number><starting four-digit year><starting two-digit month><starting two-digit day><ending four-digit year><ending two-digit month><ending two-digit day></tt>.  For example, the folder containing data from March 26, 2008 to March 27, 2008 might be named <tt>MSHist012008032620080327</tt>.
+
; [[wvWare]]
 +
: http://wvware.sourceforge.net/
 +
: Extracts metadata from various [[Microsoft]] Word files ([[doc]]). Can also convert doc files to other formats such as HTML or plain text.
  
Note that not every file named index.dat is a MSIE History (Cache) file.
+
; [[Outside In]]
 +
: http://www.oracle.com/technology/products/content-management/oit/oit_all.html
 +
: Originally developed by Stellant, supports hundreds of file types.
  
== File Header ==
+
; [[FI Tools]]
Every version of Internet Explorer since Internet Explorer 5 has used the same structure for the file header and the individual records.  Internet Explorer history files begin with:
+
: http://forensicinnovations.com/
43 6c 69 65 6e 74 20 55 72 6c 43 61 63 68 65 20 4d 4d 46 20 56 65 72 20 35 2e 32
+
: More than 100 file types.
Which represents the ascii string "Client UrlCache MMF Ver 5.2"
+
  
The MSIE 4 index.dat files start with "Client UrlCache MMF Ver 4.7" and use a different version of the format.
+
=PDF Files=
  
The next field in the file header starts at byte offset 28 and is a four byte representation of the file size. The number will be stored in [[endianness | little-endian]] format so the numbers must actually be reversed to calculate the value.
+
; [[xpdf]]
 +
: http://www.foolabs.com/xpdf/
 +
: [[pdfinfo]] (part of the [[xpdf]] package) displays some metadata of [[PDF]] files.
  
Also of interest in the file header is the location of the cache directories.  In the URL records the cache directories are given as a number, with one representing the first cache directory, two representing the second and so on.  The names of the cache directories are kept at byte offset 64 in the file.  Each directory entry is 12 bytes long of which the first eight bytes contain the directory name.
 
  
== Allocation bitmap ==
+
(See [[PDF]])
The IE History File contains an allocation bitmap starting from offset 0x250 to 0x4000.
+
  
== Record Formats ==
+
=Images=
  
Every record has a similar header that consists of 8 bytes.
+
; [[jhead]]
 +
: http://www.sentex.net/~mwandel/jhead/
 +
: Displays or modifies [[Exif]] data in [[JPEG]] files.
  
<pre>typedef struct _RECORD_HEADER {
+
; [[vinetto]]
  /* 000 */ char        Signature[4];
+
: http://vinetto.sourceforge.net/
  /* 004 */ uint32_t    NumberOfBlocksInRecord;
+
: Examines [[Thumbs.db]] files.
} RECORD_HEADER;</pre>
+
  
The size of the record can be determined from the number of blocks in the record; per default the block size is 128 bytes. Therefore, a length of <pre>05 00 00 00</pre> would indicate five blocks (because the number is stored in little-endian format) of 128 bytes for a total record length of 640 bytes. Note that even for allocated records the number of blocks value cannot be fully relied upon.
+
;[[libexif]]
 +
: http://sourceforge.net/projects/libexif EXIF tag Parsing Library
  
The blocks that make up a record can have slack space.  
+
; [[Adroit Photo Forensics]]
 +
: http://digital-assembly.com/products/adroit-photo-forensics/
 +
: Displays meta data and uses date and camera meta-data for grouping, timelines etc.
  
Currently 4 types of records are known:
+
=General=
* URL
+
These general-purpose programs frequently work when the special-purpose programs fail, but they generally provide less detailed information.
* REDR
+
* HASH
+
* LEAK
+
  
Note that the location and filename strings are stored in the local codepage, normally these strings will only use the ASCII character set. Chinese versions of Windows are known to also use extended characters as well.
+
; [[Metadata Extraction Tool]]
 +
: "Developed by the National Library of New Zealand to programmatically extract preservation metadata from a range of file formats like PDF documents, image files, sound files Microsoft office documents, and many others."
 +
: http://meta-extractor.sourceforge.net/
  
=== URL Records ===
+
; [[Metadata Assistant]]
 +
: http://www.payneconsulting.com/products/metadataent/
  
These records indicate URIs that were actually requested. They contain the location and additional data like the web server's HTTP response. They begin with the header, in hexadecimal:
+
; [[hachoir|hachoir-metadata]]
 +
: Extraction tool, part of '''[[Hachoir]]''' project
  
<pre>55 52 4C 20</pre>
+
; [[file]]
This corresponds to the string <tt>URL</tt> followed by a space.
+
: The UNIX '''file''' program can extract some metadata
  
The definition for the structure in C99 format:
+
; [[GNU libextractor]]
 +
: http://gnunet.org/libextractor/ The libextractor library is a plugable system for extracting metadata
  
<pre>typedef struct _URL_RECORD_HEADER {
+
; [[Directory Lister Pro]]
  /* 000 */ char        Signature[4];
+
: Directory Lister Pro is a Windows tool which creates listings of files from selected directories on hard disks, CD-ROMs, DVD-ROMs, floppies, USB storages and network shares. Listing can be in HTML, text or CSV format (for easy import to Excel). Listing can contain standard file information like file name, extension, type, owner and date created, but especially for forensic analysis file meta data can be extracted from various formats: 1) executable file information (EXE, DLL, OCX) like file version, description, company, product name. 2) multimedia properties (MP3, AVI, WAV, JPG, GIF, BMP, MKV, MKA, MPEG) like track, title, artist, album, genre, video format, bits per pixel, frames per second, audio format, bits per channel. 3) Microsoft Office files (DOC, DOCX, XLS, XLSX, PPT, PPTX) like document title, author, keywords, word count. For each file and folder it is also possible to obtain its CRC32, MD5, SHA-1 and Whirlpool hash sum. Extensive number of options allows to completely customize the visual look of the output. Filter on file name, date, size or attributes can be applied so it is possible to limit the files listed.
  /* 004 */ uint32_t    AmountOfBlocksInRecord;
+
: http://www.krksoft.com
  /* 008 */ FILETIME    LastModified;
+
  /* 010 */ FILETIME    LastAccessed;
+
  /* 018 */ FATTIME    Expires;
+
  /* 01c */
+
  // Not finished yet
+
} URL_RECORD_HEADER;</pre>
+
  
<pre>
+
[[Category:Tools]]
typedef struct _FILETIME {
+
  /* 000 */ uint32_t    lower;
+
  /* 004 */ uint32_t    upper;
+
} FILETIME;</pre>
+
 
+
<pre>
+
typedef struct _FATTIME {
+
  /* 000 */ uint16_t    date;
+
  /* 002 */ uint16_t    time;
+
} FATTIME;</pre>
+
 
+
The actual interpretation of the "LastModified" and "LastAccessed" fields depends on the type of history file in which the record is contained. As a matter of fact, Internet Explorer uses three different types of history files, namely Daily History, Weekly History, and Main History. Other "index.dat" files are used to store cached copies of visited pages and cookies.
+
The information concerning how to intepret the dates of these different files can be found on Capt. Steve Bunting's web page at the University of Delaware Computer Forensics Lab (http://www.stevebunting.org/udpd4n6/forensics/index_dat2.htm).
+
Please be aware that most free and/or open source index.dat parsing programs, as well as quite a few commercial forensic tools, are not able to correctly interpret the above dates. More specifically, they interpret all the time and dates as if the records were contained into a Daily History file regardless of the actual type of the file they are stored in.
+
 
+
=== REDR Records ===
+
REDR records are very simple records.  They simply indicate that the browser was redirected to another site.  REDR records always start with the string REDR (0x52 45  44 52).  The next four bytes are the size of the record in little endian format.  The size will indicate the number 128 byte blocks.
+
 
+
At offset 8 from the start of the REDR record is an unknown data field.  It has been confirmed that this is not a date field.
+
 
+
16 bytes into the REDR record is the URL that was visited in a null-terminated string.  After the URL, the REDR record appears to be padded with zeros until the end of the 128 byte block.
+
 
+
=== HASH Records ===
+
 
+
=== LEAK Records ===
+
The exact purpose of LEAK records remains unknown, however research performed by Mike Murr suggests that LEAK records are created when the machine attempts to delete records from the history file while a corresponding Temporary Internet File (TIF) is held open and cannot be deleted.
+
 
+
== See Also ==
+
 
+
* [[Internet Explorer]]
+
 
+
== External Links ==
+
 
+
* [http://www.milincorporated.com/a3_index.dat.html What is in Index.dat files]
+
* [http://code.google.com/p/libmsiecf/downloads/detail?name=MSIE%20Cache%20File%20%28index.dat%29%20format.pdf MSIE Cache File (index.dat) format specification], by the [[libmsiecf|libmsiecf project]]
+
* [http://kb.digital-detective.co.uk/display/NetAnalysis1/Internet+Explorer Digital Detective Knowledge Base: Internet Explorer]
+
* [http://web.archive.org/web/20090605202325/http://128.175.24.251/forensics/index_dat1.htm Understanding index.dat Files - Part 1], by Stephen M. Bunting
+
* [http://web.archive.org/web/20090605200839/http://128.175.24.251/forensics/index_dat2.htm Understanding index.dat Files - Part 2], by Stephen M. Bunting
+
* [http://web.archive.org/web/20090824054415/http://www.foundstone.com/us/pdf/wp_index_dat.pdf Detailed analysis of index.dat file format], by Keith J. Jones, March 19, 2003
+
* [http://www.forensicblog.org/2009/09/10/the-meaning-of-leak-records/ The Meaning of LEAK records], [[Mike Murr]], September 10, 2009
+
* [http://blog.digital-detective.co.uk/2010/04/microsoft-internet-explorer-privacie.html Microsoft Internet Explorer PrivacIE Entries], by Digital Detective, April 29, 2010
+
* [http://blogs.msdn.com/b/ieinternals/archive/2011/03/19/wininet-temporary-internet-files-cache-and-explorer-folder-view.aspx A Primer on Temporary Internet Files], by Eric Law, March 19, 2011
+
 
+
== Tools ==
+
* [http://www.cqure.net/wp/iehist/ IEHist]
+
* [[libmsiecf]]
+
* [https://sourceforge.net/projects/odessa/ pacso], note this tool has not been updates since 2004 and is considered deprecated
+
* [https://sourceforge.net/projects/pasco2/ pasco2]
+
* [http://www.tzworks.net/prototype_page.php?proto_id=6 Windows 'index.dat' Parser (id)], by [[TZWorks LLC]]
+
 
+
[[Category:File Formats]]
+

Revision as of 00:01, 13 January 2010

Here are tools that will extract metadata from document files.

Contents

Office Files

antiword
http://www.winfield.demon.nl/
catdoc
http://www.45.free.net/~vitus/software/catdoc/
laola
http://user.cs.tu-berlin.de/~schwartz/pmh/index.html
word2x
http://word2x.sourceforge.net/
wvWare
http://wvware.sourceforge.net/
Extracts metadata from various Microsoft Word files (doc). Can also convert doc files to other formats such as HTML or plain text.
Outside In
http://www.oracle.com/technology/products/content-management/oit/oit_all.html
Originally developed by Stellant, supports hundreds of file types.
FI Tools
http://forensicinnovations.com/
More than 100 file types.

PDF Files

xpdf
http://www.foolabs.com/xpdf/
pdfinfo (part of the xpdf package) displays some metadata of PDF files.


(See PDF)

Images

jhead
http://www.sentex.net/~mwandel/jhead/
Displays or modifies Exif data in JPEG files.
vinetto
http://vinetto.sourceforge.net/
Examines Thumbs.db files.
libexif
http://sourceforge.net/projects/libexif EXIF tag Parsing Library
Adroit Photo Forensics
http://digital-assembly.com/products/adroit-photo-forensics/
Displays meta data and uses date and camera meta-data for grouping, timelines etc.

General

These general-purpose programs frequently work when the special-purpose programs fail, but they generally provide less detailed information.

Metadata Extraction Tool
"Developed by the National Library of New Zealand to programmatically extract preservation metadata from a range of file formats like PDF documents, image files, sound files Microsoft office documents, and many others."
http://meta-extractor.sourceforge.net/
Metadata Assistant
http://www.payneconsulting.com/products/metadataent/
hachoir-metadata
Extraction tool, part of Hachoir project
file
The UNIX file program can extract some metadata
GNU libextractor
http://gnunet.org/libextractor/ The libextractor library is a plugable system for extracting metadata
Directory Lister Pro
Directory Lister Pro is a Windows tool which creates listings of files from selected directories on hard disks, CD-ROMs, DVD-ROMs, floppies, USB storages and network shares. Listing can be in HTML, text or CSV format (for easy import to Excel). Listing can contain standard file information like file name, extension, type, owner and date created, but especially for forensic analysis file meta data can be extracted from various formats: 1) executable file information (EXE, DLL, OCX) like file version, description, company, product name. 2) multimedia properties (MP3, AVI, WAV, JPG, GIF, BMP, MKV, MKA, MPEG) like track, title, artist, album, genre, video format, bits per pixel, frames per second, audio format, bits per channel. 3) Microsoft Office files (DOC, DOCX, XLS, XLSX, PPT, PPTX) like document title, author, keywords, word count. For each file and folder it is also possible to obtain its CRC32, MD5, SHA-1 and Whirlpool hash sum. Extensive number of options allows to completely customize the visual look of the output. Filter on file name, date, size or attributes can be applied so it is possible to limit the files listed.
http://www.krksoft.com