Difference between pages "Bulk extractor" and "Windows Prefetch File Format"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m (current version is 1.4.4)
 
m (Section A - Metrics array)
 
Line 1: Line 1:
== Overview ==
+
{{expand}}
'''bulk_extractor''' is a computer forensics tool that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures. The results can be easily inspected, parsed, or processed with automated tools. '''bulk_extractor''' also created a histograms of features that it finds, as features that are more common tend to be more important. The program can be used for law enforcement, defense, intelligence, and cyber-investigation applications.
+
  
bulk_extractor is distinguished from other forensic tools by its speed and thoroughness. Because it ignores file system structure, bulk_extractor can process different parts of the disk in parallel. In practice, the program splits the disk up into 16MiByte pages and processes one page on each available core. This means that 24-core machines process a disk roughly 24 times faster than a 1-core machine. bulk_extractor is also thorough. That’s because bulk_extractor automatically detects, decompresses, and recursively re-processes compressed data that is compressed with a variety of algorithms. Our testing has shown that there is a significant amount of compressed data in the unallocated regions of file systems that is missed by most forensic tools that are commonly in use today.
+
A Windows Prefetch file consists of one file header and multiple file sections with different content. Not all content has an obvious forensic value.
  
Another advantage of ignoring file systems is that bulk_extractor can be used to process any digital media. We have used the program to process hard drives, SSDs, optical media, camera cards, cell phones, network packet dumps, and other kinds of digital information.
+
As far as have been possible to ascertain, there is no public description of the format. The description below has been synthesised from examination
 +
of multiple prefetch files.
  
==Output Feature Files==
+
== Characteristics ==
 +
{| class="wikitable"
 +
|-
 +
| <b>Integers</b>
 +
| stored in little-endian
 +
|-
 +
| <b>Strings</b>
 +
| Stored as [http://en.wikipedia.org/wiki/UTF-16/UCS-2 UTF-16 little-endian] without a byte-order-mark (BOM).
 +
|-
 +
| <b>Timestamps</b>
 +
| Stored as [http://msdn2.microsoft.com/en-us/library/ms724284.aspx Windows FILETIME] in UTC.
 +
|-
 +
|}
  
bulk_extractor now creates an output directory that includes:
+
== File header ==
* '''ccn.txt''' -- Credit card numbers
+
The file header is 84 bytes of size and consists of:
* '''ccn_track2.txt''' -- Credit card “track 2″ information
+
{| class="wikitable"
* '''domain.txt''' -- Internet domains found on the drive, including dotted-quad addresses found in text.
+
|-
* '''email.txt''' -- Email addresses
+
! Field
* '''ether.txt''' -- Ethernet MAC addresses found through IP packet carving of swap files and compressed system hibernation files and file fragments.
+
! Offset
* '''exif.txt''' -- EXIFs from JPEGs and video segments. This feature file contains all of the EXIF fields, expanded as XML records.
+
! Length
* '''find.txt''' -- The results of specific regular expression search requests.
+
! Type
* '''ip.txt''' -- IP addresses found through IP packet carving.
+
! Notes
* '''telephone.txt''' --- US and international telephone numbers.
+
|-
* '''url.txt''' --- URLs, typically found in browser caches, email messages, and pre-compiled into executables.
+
| H1
* '''url_searches.txt''' --- A histogram of terms used in Internet searches from services such as Google, Bing, Yahoo, and others.
+
| 0x0000
* '''wordlist.txt''' --- :A list of all “words” extracted from the disk, useful for password cracking.
+
| 4
* '''wordlist_*.txt''' --- The wordlist with duplicates removed, formatted in a form that can be easily imported into a popular password-cracking program.
+
| DWORD
* '''zip.txt''' --- A file containing information regarding every ZIP file component found on the media. This is exceptionally useful as ZIP files contain internal structure and ZIP is increasingly the compound file format of choice for a variety of products such as Microsoft Office
+
| Format version (see format version section below)
 +
|-
 +
| H2
 +
| 0x0004
 +
| 4
 +
| DWORD
 +
| Signature 'SCCA' (or in hexadecimal representation 0x53 0x43 0x43 0x4)
 +
|-
 +
| H3
 +
| 0x0008
 +
| 4
 +
| DWORD?
 +
| Unknown - Values observed: 0x0F - Windows XP, 0x11 - Windows 7, Windows 8.1
 +
|-
 +
| H4
 +
| 0x000C
 +
| 4
 +
| DWORD
 +
| Prefetch file size (or length) (sometimes referred to as End of File (EOF)).
 +
|-
 +
| H5
 +
|0x0010
 +
| 60
 +
| USTR
 +
| The name of the (original) executable as a Unicode (UTF-16 litte-endian string), up to 29 characters and terminated by an end-of-string character (U+0000). This name should correspond with the one in the prefetch file filename.
 +
|-
 +
| H6
 +
|0x004C
 +
|4
 +
|DWORD
 +
|The prefetch hash. This hash value should correspond with the one in the prefetch file filename.
 +
|-
 +
| H7
 +
|0x0050
 +
|4
 +
|?
 +
| Unknown (flags)? Values observed: 0 for almost all prefetch files (XP); 1 for NTOSBOOT-B00DFAAD.pf (XP)
 +
|-
 +
|}
  
For each of the above, two additional files may be created:
+
It's worth noting that the name of a carved prefetch file can be restored using the information in field H5 and H6, and its size can be determined by field H4.
* '''*_stopped.txt''' --- bulk_extractor supports a stop list, or a list of items that do not need to be brought to the user’s attention. However rather than simply suppressing this information, which might cause something critical to be hidden, stopped entries are stored in the stopped files.
+
* '''*_histogram.txt''' --- bulk_extractor can also create histograms of features. This is important, as experience has shown that email addresses, domain names, URLs, and other information that appear more frequently on a hard drive or in a cell phone’s memory can be used to rapidly create a pattern of life report.
+
  
Bulk extractor also creates a file that captures the provenance of the run:
+
=== Format version ===
;report.xml
+
:A Digital Forensics XML report that includes information about the source media, how the bulk_extractor program was compiled and run, the time to process the digital evidence, and a meta report of the information that was found.
+
  
==Post-Processing==
+
{| class="wikitable"
 +
|-
 +
! Value
 +
! Windows version
 +
|-
 +
| 17 (0x11)
 +
| Windows XP, Windows 2003
 +
|-
 +
| 23 (0x17)
 +
| Windows Vista, Windows 7
 +
|-
 +
| 26 (0x1a)
 +
| Windows 8.1 (note this could be Windows 8 as well but has not been confirmed)
 +
|-
 +
|}
  
We have developed four programs for post-processing the bulk_extractor output:
+
=== File information ===
;bulk_diff.py
+
The format of the file information is version dependent.
:This program reports the differences between two bulk_extractor runs. The intent is to image a computer, run bulk_extractor on a disk image, let the computer run for a period of time, re-image the computer, run bulk_extractor on the second image, and then report the differences. This can be used to infer the user’s activities within a time period.
+
;cda_tool.py
+
:This tool, currently under development, reads multiple bulk_extractor reports from multiple runs against multiple drives and performs a multi-drive correlation using Garfinkel’s Cross Drive Analysis technique. This can be used to automatically identify new social networks or to identify new members of existing networks.
+
;identify_filenames.py
+
:In the bulk_extractor feature file, each feature is annotated with the byte offset from the beginning of the image in which it was found. The program takes as input a bulk_extractor feature file and a DFXML file containing the locations of each file on the drive (produced with Garfinkel’s fiwalk program) and produces an annotated feature file that contains the offset, feature, and the file in which the feature was found.
+
;make_context_stop_list.py
+
:Although forensic analysts frequently make “stop lists”—for example, a lsit of email addresses that appear in the operating system and should therefore be ignored—such lists have a significant problem. Because it is relatively easy to get an email address into the binary of an open source application, ignoring all of these email addresses may make it possible to cloak email addresses from forensic analysis. Our solution is to create context-sensitive stop lists, in which the feature to be stopped is presented with the context in which it occures. The make_context_stop_list.py program takes the results of multiple bulk_extractor runs and creates a single context-sensitive stop list that can then be used to suppress features when found in a specific context. One such stop list constructed from Windows and Linux operating systems is available on the bulk extractor website.
+
  
== Download ==
+
Note that some other format specifications consider the file information part of the file header.  
The current version of '''bulk_extractor''' is 1.4.4.  
+
  
* Downloads are available at: http://digitalcorpora.org/downloads/bulk_extractor/
+
==== File information - version 17 ====
* A WIndows installer with the GUI can be downloaded from: http://www.digitalcorpora.org/downloads/bulk_extractor/bulk_extractor-1.4.1-windowsinstaller.exe
+
The file information – version 17 is 68 bytes of size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! Field
 +
! Offset
 +
! Length
 +
! Type
 +
! Notes
 +
|-
 +
|
 +
| 0x0054
 +
| 4
 +
| DWORD
 +
| The offset to section A. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0058
 +
| 4
 +
| DWORD
 +
| The number of entries in section A.
 +
|-
 +
|
 +
| 0x005C
 +
| 4
 +
| DWORD
 +
| The offset to section B. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0060
 +
| 4
 +
| DWORD
 +
| The number of entries in section B.
 +
|-
 +
|
 +
| 0x0064
 +
| 4
 +
| DWORD
 +
| The offset to section C. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0068
 +
| 4
 +
| DWORD
 +
| Length of section C.
 +
|-
 +
|
 +
| 0x006C
 +
| 4
 +
| DWORD
 +
| Offset to section D. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0070
 +
| 4
 +
| DWORD
 +
| The number of entries in section D.
 +
|-
 +
|
 +
| 0x0074
 +
| 4
 +
| DWORD
 +
| Length of section D.
 +
|-
 +
|
 +
| 0x0078
 +
| 8
 +
| FILETIME
 +
| Latest execution time (or run time) of executable (FILETIME)
 +
|-
 +
|
 +
| 0x0080
 +
| 16
 +
| ?
 +
| Unknown ? Possibly structured as 4 DWORD. Observed values: /0x00000000 0x00000000 0x00000000 0x00000000/, /0x47868c00 0x00000000 0x47860c00 0x00000000/ (don't exclude the possibility here that this is remnant data)
 +
|-
 +
|
 +
| 0x0090
 +
| 4
 +
| DWORD
 +
| Execution counter (or run count)
 +
|-
 +
|
 +
| 0x0094
 +
| 4
 +
| DWORD?
 +
| Unknown ? Observed values: 1, 2, 3, 4, 5, 6 (XP)
 +
|-
 +
|}
  
== Bibliography ==
+
==== File information - version 23 ====
=== Academic Publications ===
+
The file information – version 23 is 156 bytes of size and consists of:
# Garfinkel, Simson, [http://simson.net/clips/academic/2013.COSE.bulk_extractor.pdf Digital media triage with bulk data analysis and bulk_extractor]. Computers and Security 32: 56-72 (2013)
+
{| class="wikitable"
# Beverly, Robert, Simson Garfinkel and Greg Cardwell, [http://simson.net/clips/academic/2011.DFRWS.ipcarving.pdf "Forensic Carving of Network Packets and Associated Data Structures"], DFRWS 2011, Aug. 1-3, 2011, New Orleans, LA. BEST PAPER AWARD (Acceptance rate: 23%, 14/62)
+
|-
#Garfinkel, S., [http://simson.net/clips/academic/2006.DFRWS.pdf Forensic Feature Extraction and Cross-Drive Analysis,]The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006. (Acceptance rate: 43%, 16/37)
+
! Field
 +
! Offset
 +
! Length
 +
! Type
 +
! Notes
 +
|-
 +
|
 +
| 0x0054
 +
| 4
 +
| DWORD
 +
| The offset to section A. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0058
 +
| 4
 +
| DWORD
 +
| The number of entries in section A.
 +
|-
 +
|
 +
| 0x005C
 +
| 4
 +
| DWORD
 +
| The offset to section B. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0060
 +
| 4
 +
| DWORD
 +
| The number of entries in section B.
 +
|-
 +
|
 +
| 0x0064
 +
| 4
 +
| DWORD
 +
| The offset to section C. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0068
 +
| 4
 +
| DWORD
 +
| Length of section C.
 +
|-
 +
|
 +
| 0x006C
 +
| 4
 +
| DWORD
 +
| Offset to section D. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0070
 +
| 4
 +
| DWORD
 +
| The number of entries in section D.
 +
|-
 +
|
 +
| 0x0074
 +
| 4
 +
| DWORD
 +
| Length of section D.
 +
|-
 +
|
 +
| <b>0x0078</b>
 +
| <b>8</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
|
 +
| 0x0080
 +
| 8
 +
| FILETIME
 +
| Latest execution time (or run time) of executable (FILETIME)
 +
|-
 +
|
 +
| 0x0088
 +
| 16
 +
| ?
 +
| Unknown
 +
|-
 +
|
 +
| 0x0098
 +
| 4
 +
| DWORD
 +
| Execution counter (or run count)
 +
|-
 +
|
 +
| 0x009C
 +
| 4
 +
| DWORD?
 +
| Unknown
 +
|-
 +
|
 +
| <b>0x00A0</b>
 +
| <b>80</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
|}
  
===YouTube===
+
==== File information - version 26 ====
'''[http://www.youtube.com/results?search_query=bulk_extractor search YouTube] for bulk_extractor videos'''
+
The file information – version 23 is 224 bytes of size and consists of:
* [http://www.youtube.com/watch?v=odvDTGA7rYI Simson Garfinkel speaking at CERIAS about bulk_extractor]
+
{| class="wikitable"
* [http://www.youtube.com/watch?v=wTBHM9DeLq4 BackTrack 5 with bulk_extractor]
+
|-
* [http://www.youtube.com/watch?v=QVfYOvhrugg Ubuntu 12.04 forensics with bulk_extractor]
+
! Field
* [http://www.youtube.com/watch?v=57RWdYhNvq8 Social Network forensics with bulk_extractor]
+
! Offset
 +
! Length
 +
! Type
 +
! Notes
 +
|-
 +
|
 +
| 0x0054
 +
| 4
 +
| DWORD
 +
| The offset to section A. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0058
 +
| 4
 +
| DWORD
 +
| The number of entries in section A.
 +
|-
 +
|
 +
| 0x005C
 +
| 4
 +
| DWORD
 +
| The offset to section B. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0060
 +
| 4
 +
| DWORD
 +
| The number of entries in section B.
 +
|-
 +
|
 +
| 0x0064
 +
| 4
 +
| DWORD
 +
| The offset to section C. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0068
 +
| 4
 +
| DWORD
 +
| Length of section C.
 +
|-
 +
|
 +
| 0x006C
 +
| 4
 +
| DWORD
 +
| Offset to section D. The offset is relative from the start of the file.
 +
|-
 +
|
 +
| 0x0070
 +
| 4
 +
| DWORD
 +
| The number of entries in section D.
 +
|-
 +
|
 +
| 0x0074
 +
| 4
 +
| DWORD
 +
| Length of section D.
 +
|-
 +
|
 +
| 0x0078
 +
| 8
 +
| ?
 +
| Unknown
 +
|-
 +
|
 +
| 0x0080
 +
| 8
 +
| FILETIME
 +
| Latest execution time (or run time) of executable (FILETIME)
 +
|-
 +
|
 +
| <b>0x0088</b>
 +
| <b>7 x 8 = 56</b>
 +
| <b>FILETIME</b>
 +
| <b>Older (most recent) latest execution time (or run time) of executable (FILETIME)</b>
 +
|-
 +
|
 +
| <b>0x00C0</b>
 +
| <b>16</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
|
 +
| 0x00D0
 +
| 4
 +
| DWORD
 +
| Execution counter (or run count)
 +
|-
 +
|
 +
| <b>0x00D4</b>
 +
| <b>4</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
|
 +
| <b>0x00D8</b>
 +
| <b>4</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
|
 +
| <b>0x00DC</b>
 +
| <b>88</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
|}
  
===Tutorials===
+
== Section A - Metrics array ==
# [http://simson.net/ref/2012/2012-08-08%20bulk_extractor%20Tutorial.pdf Using bulk_extractor for digital forensics triage and cross-drive analysis], DFRWS 2012
+
This section contains an array with 20 byte (version 17) or 32 byte (version 23 and 26) metrics array entry records.
 +
 
 +
A metrics entry records conists of:
 +
{| class="wikitable"
 +
|-
 +
! Field
 +
! Offset
 +
! Length
 +
! Type
 +
! Notes
 +
|-
 +
|
 +
| 0
 +
| 4
 +
| DWORD
 +
| Start time in ms
 +
|-
 +
|
 +
| 4
 +
| 4
 +
| DWORD
 +
| Duration in ms
 +
|-
 +
|
 +
| 8
 +
| 4
 +
| DWORD
 +
| Average duration in ms?
 +
|-
 +
|
 +
| 12
 +
| 4
 +
| DWORD
 +
| Filename string offset <br> The offset is relative to the start of the filename string section (section C)
 +
|-
 +
|
 +
| 16
 +
| 4
 +
| DWORD
 +
| Filename string number of characters without end-of-string character
 +
|-
 +
|
 +
| 20
 +
| 4
 +
| DWORD
 +
| Unknown, flags?
 +
|-
 +
|
 +
| 24
 +
| 8
 +
|
 +
| NTFS file reference
 +
|}
 +
 
 +
== Section B - Trace chains array ==
 +
This section contains an array with 12 byte (version 17, 23 and 26) entry records.
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Field
 +
! Offset
 +
! Length
 +
! Type
 +
! Notes
 +
|-
 +
|
 +
| 0
 +
| 4
 +
|
 +
| Next array entry index <br> Contains the next trace chain array entry index in the chain, where the first entry index starts with 0, or -1 (0xffffffff) for the end-of-chain.
 +
|-
 +
|
 +
| 4
 +
| 4
 +
|
 +
| Total block load count <br> Number of blocks loaded (or fetched) <br> The block size 512k (512 x 1024) bytes
 +
|-
 +
|
 +
| 8
 +
| 1
 +
|
 +
| Unknown
 +
|-
 +
|
 +
| 9
 +
| 1
 +
|
 +
| Sample duration in ms?
 +
|-
 +
|
 +
| 10
 +
| 2
 +
|
 +
| Unknown
 +
|}
 +
 
 +
== Section C - Filename strings ==
 +
This section contains filenames strings, it consists of an array of UTF-16 little-endian formatted strings with end-of-string characters (U+0000).
 +
 
 +
At the end of the section there seems to be alignment padding that can contain remnant values.
 +
 
 +
== Section D - Volumes information (block) ==
 +
 
 +
Section D contains one or more subsections, each subsection refers to directories on a volume.
 +
 
 +
If all the executables and libraries referenced in the C section are from one single disk volume, there will be only one section in the D section. If multiple volumes are referenced by section C, section D will contain multiple sections.  (A simple way to force this situation is to copy, say, NOTEPAD.EXE to a USB drive, and start it from that volume. The corresponding prefetch file will have one D header referring to, e.g. \DEVICE\HARDDISK1\DP(1)0-0+4 (the USB drive), and one to, e.g. \DEVICE\HARDDISKVOLUME1\ (where the .DLLs and other support files were found).
 +
 
 +
In this section, all offsets are assumed to be counted from the start of the D section.
 +
 
 +
=== Volume information ===
 +
The structure of the volume information is version dependent.
 +
 
 +
==== Volume information - version 17 ====
 +
The volume information – version 17 is 40 bytes in size and consists of:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Field
 +
! Offset
 +
! Length
 +
! Type
 +
! Notes
 +
|-
 +
| VI1
 +
| +0x0000
 +
| 4
 +
| DWORD
 +
| Offset to volume device path (Unicode, terminated by U+0000)
 +
|-
 +
| VI2
 +
| +0x0004
 +
| 4
 +
| DWORD
 +
| Length of volume device path (nr of characters, including terminating U+0000)
 +
|-
 +
| VI3
 +
| +0x0008
 +
| 8
 +
| FILETIME
 +
| Volume creation time.
 +
|-
 +
| VI4
 +
| +0x0010
 +
| 4
 +
| DWORD
 +
| Volume serial number of volume indicated by volume string
 +
|-
 +
| VI5
 +
| +0x0014
 +
| 4
 +
| DWORD
 +
| Offset to sub section E
 +
|-
 +
| VI6
 +
| +0x0018
 +
| 4
 +
| DWORD
 +
| Length of sub section E (in bytes)
 +
|-
 +
| VI7
 +
| +0x001C
 +
| 4
 +
| DWORD
 +
| Offset to sub section F
 +
|-
 +
| VI8
 +
| +0x0020
 +
| 4
 +
| DWORD
 +
| Number of strings in sub section F
 +
|-
 +
| VI9
 +
| +0x0024
 +
| 4
 +
| ?
 +
| Unknown
 +
|-
 +
|}
 +
 
 +
==== Volume information - version 23 ====
 +
The volume information entry – version 23 is 104 bytes in size and consists of:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Field
 +
! Offset
 +
! Length
 +
! Type
 +
! Notes
 +
|-
 +
| VI1
 +
| +0x0000
 +
| 4
 +
| DWORD
 +
| Offset to volume device path (Unicode, terminated by U+0000)
 +
|-
 +
| VI2
 +
| +0x0004
 +
| 4
 +
| DWORD
 +
| Length of volume device path (nr of characters, including terminating U+0000)
 +
|-
 +
| VI3
 +
| +0x0008
 +
| 8
 +
| FILETIME
 +
| Volume creation time.
 +
|-
 +
| VI4
 +
| +0x0010
 +
| 4
 +
| DWORD
 +
| Volume serial number of volume indicated by volume string
 +
|-
 +
| VI5
 +
| +0x0014
 +
| 4
 +
| DWORD
 +
| Offset to sub section E
 +
|-
 +
| VI6
 +
| +0x0018
 +
| 4
 +
| DWORD
 +
| Length of sub section E (in bytes)
 +
|-
 +
| VI7
 +
| +0x001C
 +
| 4
 +
| DWORD
 +
| Offset to sub section F
 +
|-
 +
| VI8
 +
| +0x0020
 +
| 4
 +
| DWORD
 +
| Number of strings in sub section F
 +
|-
 +
| VI9
 +
| +0x0024
 +
| 4
 +
| ?
 +
| Unknown
 +
|-
 +
| <b>VI10</b>
 +
| <b>+0x0028</b>
 +
| <b>28</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
| <b>VI11</b>
 +
| <b>+0x0044</b>
 +
| <b>4</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
| <b>VI12</b>
 +
| <b>+0x0048</b>
 +
| <b>28</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
| <b>VI13</b>
 +
| <b>+0x0064</b>
 +
| <b>4</b>
 +
| <b>?</b>
 +
| <b>Unknown</b>
 +
|-
 +
|}
 +
 
 +
==== Volume information - version 26 ====
 +
The volume information entry – version 26 appears to be similar to volume information – version 23.
 +
 
 +
=== Sub section E - NTFS file references ===
 +
This sub section can contain NTFS file references.
 +
 
 +
For more information see [https://googledrive.com/host/0B3fBvzttpiiSbl9XZGZzQ05hZkU/Windows%20Prefetch%20File%20(PF)%20format.pdf Windows Prefetch File (PF) format].
 +
 
 +
=== Sub section F - Directory strings ===
 +
This sub sections contains directory strings. The number of strings is stored in the volume information.
 +
 
 +
A directory string is stored in the following structure:
 +
{| class="wikitable"
 +
|-
 +
! Field
 +
! Offset
 +
! Length
 +
! Type
 +
! Notes
 +
|-
 +
|
 +
| 0x0000
 +
| 2
 +
| DWORD
 +
| Number of characters (WORDs) of the directory name. The value does not include the end-of-string character.
 +
|-
 +
|
 +
| 0x0002
 +
|
 +
| USTR
 +
| The directory name as a Unicode (UTF-16 litte-endian string) terminated by an end-of-string character (U+0000).
 +
|-
 +
|}
 +
 
 +
== See Also ==
 +
* [[Prefetch]]
 +
 
 +
== External Links ==
 +
* [https://googledrive.com/host/0B3fBvzttpiiSbl9XZGZzQ05hZkU/Windows%20Prefetch%20File%20(PF)%20format.pdf Windows Prefetch File (PF) format], by the [[libssca|libssca project]]
 +
* [http://bitbucket.cassidiancybersecurity.com/prefetch-parser/wiki/Home Windows Prefetch file format], by the [http://bitbucket.cassidiancybersecurity.com/prefetch-parser prefetch-parser] project.
 +
 
 +
[[Category:File Formats]]

Revision as of 14:37, 22 June 2014

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

A Windows Prefetch file consists of one file header and multiple file sections with different content. Not all content has an obvious forensic value.

As far as have been possible to ascertain, there is no public description of the format. The description below has been synthesised from examination of multiple prefetch files.

Characteristics

Integers stored in little-endian
Strings Stored as UTF-16 little-endian without a byte-order-mark (BOM).
Timestamps Stored as Windows FILETIME in UTC.

File header

The file header is 84 bytes of size and consists of:

Field Offset Length Type Notes
H1 0x0000 4 DWORD Format version (see format version section below)
H2 0x0004 4 DWORD Signature 'SCCA' (or in hexadecimal representation 0x53 0x43 0x43 0x4)
H3 0x0008 4 DWORD? Unknown - Values observed: 0x0F - Windows XP, 0x11 - Windows 7, Windows 8.1
H4 0x000C 4 DWORD Prefetch file size (or length) (sometimes referred to as End of File (EOF)).
H5 0x0010 60 USTR The name of the (original) executable as a Unicode (UTF-16 litte-endian string), up to 29 characters and terminated by an end-of-string character (U+0000). This name should correspond with the one in the prefetch file filename.
H6 0x004C 4 DWORD The prefetch hash. This hash value should correspond with the one in the prefetch file filename.
H7 0x0050 4 ? Unknown (flags)? Values observed: 0 for almost all prefetch files (XP); 1 for NTOSBOOT-B00DFAAD.pf (XP)

It's worth noting that the name of a carved prefetch file can be restored using the information in field H5 and H6, and its size can be determined by field H4.

Format version

Value Windows version
17 (0x11) Windows XP, Windows 2003
23 (0x17) Windows Vista, Windows 7
26 (0x1a) Windows 8.1 (note this could be Windows 8 as well but has not been confirmed)

File information

The format of the file information is version dependent.

Note that some other format specifications consider the file information part of the file header.

File information - version 17

The file information – version 17 is 68 bytes of size and consists of:

Field Offset Length Type Notes
0x0054 4 DWORD The offset to section A. The offset is relative from the start of the file.
0x0058 4 DWORD The number of entries in section A.
0x005C 4 DWORD The offset to section B. The offset is relative from the start of the file.
0x0060 4 DWORD The number of entries in section B.
0x0064 4 DWORD The offset to section C. The offset is relative from the start of the file.
0x0068 4 DWORD Length of section C.
0x006C 4 DWORD Offset to section D. The offset is relative from the start of the file.
0x0070 4 DWORD The number of entries in section D.
0x0074 4 DWORD Length of section D.
0x0078 8 FILETIME Latest execution time (or run time) of executable (FILETIME)
0x0080 16  ? Unknown ? Possibly structured as 4 DWORD. Observed values: /0x00000000 0x00000000 0x00000000 0x00000000/, /0x47868c00 0x00000000 0x47860c00 0x00000000/ (don't exclude the possibility here that this is remnant data)
0x0090 4 DWORD Execution counter (or run count)
0x0094 4 DWORD? Unknown ? Observed values: 1, 2, 3, 4, 5, 6 (XP)

File information - version 23

The file information – version 23 is 156 bytes of size and consists of:

Field Offset Length Type Notes
0x0054 4 DWORD The offset to section A. The offset is relative from the start of the file.
0x0058 4 DWORD The number of entries in section A.
0x005C 4 DWORD The offset to section B. The offset is relative from the start of the file.
0x0060 4 DWORD The number of entries in section B.
0x0064 4 DWORD The offset to section C. The offset is relative from the start of the file.
0x0068 4 DWORD Length of section C.
0x006C 4 DWORD Offset to section D. The offset is relative from the start of the file.
0x0070 4 DWORD The number of entries in section D.
0x0074 4 DWORD Length of section D.
0x0078 8 ? Unknown
0x0080 8 FILETIME Latest execution time (or run time) of executable (FILETIME)
0x0088 16  ? Unknown
0x0098 4 DWORD Execution counter (or run count)
0x009C 4 DWORD? Unknown
0x00A0 80 ? Unknown

File information - version 26

The file information – version 23 is 224 bytes of size and consists of:

Field Offset Length Type Notes
0x0054 4 DWORD The offset to section A. The offset is relative from the start of the file.
0x0058 4 DWORD The number of entries in section A.
0x005C 4 DWORD The offset to section B. The offset is relative from the start of the file.
0x0060 4 DWORD The number of entries in section B.
0x0064 4 DWORD The offset to section C. The offset is relative from the start of the file.
0x0068 4 DWORD Length of section C.
0x006C 4 DWORD Offset to section D. The offset is relative from the start of the file.
0x0070 4 DWORD The number of entries in section D.
0x0074 4 DWORD Length of section D.
0x0078 8  ? Unknown
0x0080 8 FILETIME Latest execution time (or run time) of executable (FILETIME)
0x0088 7 x 8 = 56 FILETIME Older (most recent) latest execution time (or run time) of executable (FILETIME)
0x00C0 16 ? Unknown
0x00D0 4 DWORD Execution counter (or run count)
0x00D4 4 ? Unknown
0x00D8 4 ? Unknown
0x00DC 88 ? Unknown

Section A - Metrics array

This section contains an array with 20 byte (version 17) or 32 byte (version 23 and 26) metrics array entry records.

A metrics entry records conists of:

Field Offset Length Type Notes
0 4 DWORD Start time in ms
4 4 DWORD Duration in ms
8 4 DWORD Average duration in ms?
12 4 DWORD Filename string offset
The offset is relative to the start of the filename string section (section C)
16 4 DWORD Filename string number of characters without end-of-string character
20 4 DWORD Unknown, flags?
24 8 NTFS file reference

Section B - Trace chains array

This section contains an array with 12 byte (version 17, 23 and 26) entry records.

Field Offset Length Type Notes
0 4 Next array entry index
Contains the next trace chain array entry index in the chain, where the first entry index starts with 0, or -1 (0xffffffff) for the end-of-chain.
4 4 Total block load count
Number of blocks loaded (or fetched)
The block size 512k (512 x 1024) bytes
8 1 Unknown
9 1 Sample duration in ms?
10 2 Unknown

Section C - Filename strings

This section contains filenames strings, it consists of an array of UTF-16 little-endian formatted strings with end-of-string characters (U+0000).

At the end of the section there seems to be alignment padding that can contain remnant values.

Section D - Volumes information (block)

Section D contains one or more subsections, each subsection refers to directories on a volume.

If all the executables and libraries referenced in the C section are from one single disk volume, there will be only one section in the D section. If multiple volumes are referenced by section C, section D will contain multiple sections. (A simple way to force this situation is to copy, say, NOTEPAD.EXE to a USB drive, and start it from that volume. The corresponding prefetch file will have one D header referring to, e.g. \DEVICE\HARDDISK1\DP(1)0-0+4 (the USB drive), and one to, e.g. \DEVICE\HARDDISKVOLUME1\ (where the .DLLs and other support files were found).

In this section, all offsets are assumed to be counted from the start of the D section.

Volume information

The structure of the volume information is version dependent.

Volume information - version 17

The volume information – version 17 is 40 bytes in size and consists of:

Field Offset Length Type Notes
VI1 +0x0000 4 DWORD Offset to volume device path (Unicode, terminated by U+0000)
VI2 +0x0004 4 DWORD Length of volume device path (nr of characters, including terminating U+0000)
VI3 +0x0008 8 FILETIME Volume creation time.
VI4 +0x0010 4 DWORD Volume serial number of volume indicated by volume string
VI5 +0x0014 4 DWORD Offset to sub section E
VI6 +0x0018 4 DWORD Length of sub section E (in bytes)
VI7 +0x001C 4 DWORD Offset to sub section F
VI8 +0x0020 4 DWORD Number of strings in sub section F
VI9 +0x0024 4  ? Unknown

Volume information - version 23

The volume information entry – version 23 is 104 bytes in size and consists of:

Field Offset Length Type Notes
VI1 +0x0000 4 DWORD Offset to volume device path (Unicode, terminated by U+0000)
VI2 +0x0004 4 DWORD Length of volume device path (nr of characters, including terminating U+0000)
VI3 +0x0008 8 FILETIME Volume creation time.
VI4 +0x0010 4 DWORD Volume serial number of volume indicated by volume string
VI5 +0x0014 4 DWORD Offset to sub section E
VI6 +0x0018 4 DWORD Length of sub section E (in bytes)
VI7 +0x001C 4 DWORD Offset to sub section F
VI8 +0x0020 4 DWORD Number of strings in sub section F
VI9 +0x0024 4  ? Unknown
VI10 +0x0028 28 ? Unknown
VI11 +0x0044 4 ? Unknown
VI12 +0x0048 28 ? Unknown
VI13 +0x0064 4 ? Unknown

Volume information - version 26

The volume information entry – version 26 appears to be similar to volume information – version 23.

Sub section E - NTFS file references

This sub section can contain NTFS file references.

For more information see Windows Prefetch File (PF) format.

Sub section F - Directory strings

This sub sections contains directory strings. The number of strings is stored in the volume information.

A directory string is stored in the following structure:

Field Offset Length Type Notes
0x0000 2 DWORD Number of characters (WORDs) of the directory name. The value does not include the end-of-string character.
0x0002 USTR The directory name as a Unicode (UTF-16 litte-endian string) terminated by an end-of-string character (U+0000).

See Also

External Links