Difference between pages "Windows" and "Bulk extractor"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(User Account Control)
 
m (current version is 1.4.4)
 
Line 1: Line 1:
{{Expand}}
+
== Overview ==
 +
'''bulk_extractor''' is a computer forensics tool that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures. The results can be easily inspected, parsed, or processed with automated tools. '''bulk_extractor''' also created a histograms of features that it finds, as features that are more common tend to be more important. The program can be used for law enforcement, defense, intelligence, and cyber-investigation applications.
  
'''Windows''' is a widely-spread [[operating system]] from [[Microsoft]].
+
bulk_extractor is distinguished from other forensic tools by its speed and thoroughness. Because it ignores file system structure, bulk_extractor can process different parts of the disk in parallel. In practice, the program splits the disk up into 16MiByte pages and processes one page on each available core. This means that 24-core machines process a disk roughly 24 times faster than a 1-core machine. bulk_extractor is also thorough. That’s because bulk_extractor automatically detects, decompresses, and recursively re-processes compressed data that is compressed with a variety of algorithms. Our testing has shown that there is a significant amount of compressed data in the unallocated regions of file systems that is missed by most forensic tools that are commonly in use today.
  
There are 2 main branches of Windows:
+
Another advantage of ignoring file systems is that bulk_extractor can be used to process any digital media. We have used the program to process hard drives, SSDs, optical media, camera cards, cell phones, network packet dumps, and other kinds of digital information.
* the DOS-branch: i.e. Windows 95, 98, ME
+
* the NT-branch: i.e. Windows NT 4, XP, Vista
+
  
== Features ==
+
==Output Feature Files==
* Basic and Dynamic Disks, see: [http://msdn.microsoft.com/en-us/library/windows/desktop/aa363785(v=vs.85).aspx]
+
  
=== Introduced in Windows NT ===
+
bulk_extractor now creates an output directory that includes:
* [[NTFS]]
+
* '''ccn.txt''' -- Credit card numbers
 +
* '''ccn_track2.txt''' -- Credit card “track 2″ information
 +
* '''domain.txt''' -- Internet domains found on the drive, including dotted-quad addresses found in text.
 +
* '''email.txt''' -- Email addresses
 +
* '''ether.txt''' -- Ethernet MAC addresses found through IP packet carving of swap files and compressed system hibernation files and file fragments.
 +
* '''exif.txt''' -- EXIFs from JPEGs and video segments. This feature file contains all of the EXIF fields, expanded as XML records.
 +
* '''find.txt''' -- The results of specific regular expression search requests.
 +
* '''ip.txt''' -- IP addresses found through IP packet carving.
 +
* '''telephone.txt''' --- US and international telephone numbers.
 +
* '''url.txt''' --- URLs, typically found in browser caches, email messages, and pre-compiled into executables.
 +
* '''url_searches.txt''' --- A histogram of terms used in Internet searches from services such as Google, Bing, Yahoo, and others.
 +
* '''wordlist.txt''' --- :A list of all “words” extracted from the disk, useful for password cracking.
 +
* '''wordlist_*.txt''' --- The wordlist with duplicates removed, formatted in a form that can be easily imported into a popular password-cracking program.
 +
* '''zip.txt''' --- A file containing information regarding every ZIP file component found on the media. This is exceptionally useful as ZIP files contain internal structure and ZIP is increasingly the compound file format of choice for a variety of products such as Microsoft Office
  
=== Introduced in Windows 2000 ===
+
For each of the above, two additional files may be created:
 +
* '''*_stopped.txt''' --- bulk_extractor supports a stop list, or a list of items that do not need to be brought to the user’s attention. However rather than simply suppressing this information, which might cause something critical to be hidden, stopped entries are stored in the stopped files.
 +
* '''*_histogram.txt''' --- bulk_extractor can also create histograms of features. This is important, as experience has shown that email addresses, domain names, URLs, and other information that appear more frequently on a hard drive or in a cell phone’s memory can be used to rapidly create a pattern of life report.
  
=== Introduced in Windows XP ===
+
Bulk extractor also creates a file that captures the provenance of the run:
* [[Prefetch]]
+
;report.xml
* System Restore (Restore Points); also present in Windows ME
+
:A Digital Forensics XML report that includes information about the source media, how the bulk_extractor program was compiled and run, the time to process the digital evidence, and a meta report of the information that was found.
  
==== SP2 ====
+
==Post-Processing==
* Windows Firewall
+
  
=== Introduced in Windows Server 2003 ===
+
We have developed four programs for post-processing the bulk_extractor output:
* Volume Shadow Copies
+
;bulk_diff.py
 +
:This program reports the differences between two bulk_extractor runs. The intent is to image a computer, run bulk_extractor on a disk image, let the computer run for a period of time, re-image the computer, run bulk_extractor on the second image, and then report the differences. This can be used to infer the user’s activities within a time period.
 +
;cda_tool.py
 +
:This tool, currently under development, reads multiple bulk_extractor reports from multiple runs against multiple drives and performs a multi-drive correlation using Garfinkel’s Cross Drive Analysis technique. This can be used to automatically identify new social networks or to identify new members of existing networks.
 +
;identify_filenames.py
 +
:In the bulk_extractor feature file, each feature is annotated with the byte offset from the beginning of the image in which it was found. The program takes as input a bulk_extractor feature file and a DFXML file containing the locations of each file on the drive (produced with Garfinkel’s fiwalk program) and produces an annotated feature file that contains the offset, feature, and the file in which the feature was found.
 +
;make_context_stop_list.py
 +
:Although forensic analysts frequently make “stop lists”—for example, a lsit of email addresses that appear in the operating system and should therefore be ignored—such lists have a significant problem. Because it is relatively easy to get an email address into the binary of an open source application, ignoring all of these email addresses may make it possible to cloak email addresses from forensic analysis. Our solution is to create context-sensitive stop lists, in which the feature to be stopped is presented with the context in which it occures. The make_context_stop_list.py program takes the results of multiple bulk_extractor runs and creates a single context-sensitive stop list that can then be used to suppress features when found in a specific context. One such stop list constructed from Windows and Linux operating systems is available on the bulk extractor website.
  
=== Introduced in [[Windows Vista]] ===  
+
== Download ==
* [[BitLocker Disk Encryption | BitLocker]]
+
The current version of '''bulk_extractor''' is 1.4.4.  
* [[Windows Desktop Search | Search]] integrated in operating system
+
* [[ReadyBoost]]
+
* [[SuperFetch]]
+
* [[NTFS|Transactional NTFS (TxF)]]
+
* [[Windows NT Registry File (REGF)|Transactional Registry (TxR)]]
+
* [[Windows Shadow Volumes|Shadow Volumes]]; the volume-based storage of the Volume Shadow Copy data
+
* $Recycle.Bin
+
* [[Windows XML Event Log (EVTX)]]
+
* [[User Account Control (UAC)]]
+
  
=== Introduced in Windows Server 2008 ===
+
* Downloads are available at: http://digitalcorpora.org/downloads/bulk_extractor/
 +
* A WIndows installer with the GUI can be downloaded from: http://www.digitalcorpora.org/downloads/bulk_extractor/bulk_extractor-1.4.1-windowsinstaller.exe
  
=== Introduced in [[Windows 7]] ===
+
== Bibliography ==
* [[BitLocker Disk Encryption | BitLocker To Go]]
+
=== Academic Publications ===
* [[Jump Lists]]
+
# Garfinkel, Simson, [http://simson.net/clips/academic/2013.COSE.bulk_extractor.pdf Digital media triage with bulk data analysis and bulk_extractor]. Computers and Security 32: 56-72 (2013)
* [[Sticky Notes]]
+
# Beverly, Robert, Simson Garfinkel and Greg Cardwell, [http://simson.net/clips/academic/2011.DFRWS.ipcarving.pdf "Forensic Carving of Network Packets and Associated Data Structures"], DFRWS 2011, Aug. 1-3, 2011, New Orleans, LA. BEST PAPER AWARD (Acceptance rate: 23%, 14/62)
 +
#Garfinkel, S., [http://simson.net/clips/academic/2006.DFRWS.pdf Forensic Feature Extraction and Cross-Drive Analysis,]The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006. (Acceptance rate: 43%, 16/37)
  
=== Introduced in [[Windows 8]] ===
+
===YouTube===
* [[Windows File History | File History]]
+
'''[http://www.youtube.com/results?search_query=bulk_extractor search YouTube] for bulk_extractor videos'''
* [[Windows Storage Spaces | Storage Spaces]]
+
* [http://www.youtube.com/watch?v=odvDTGA7rYI Simson Garfinkel speaking at CERIAS about bulk_extractor]
* [[Search Charm History]]
+
* [http://www.youtube.com/watch?v=wTBHM9DeLq4 BackTrack 5 with bulk_extractor]
* [[Resilient File System (ReFS)]]; Was initially available in the Windows 8 server edition.
+
* [http://www.youtube.com/watch?v=QVfYOvhrugg Ubuntu 12.04 forensics with bulk_extractor]
 +
* [http://www.youtube.com/watch?v=57RWdYhNvq8 Social Network forensics with bulk_extractor]
  
=== Introduced in Windows Server 2012 ===
+
===Tutorials===
* [[Resilient File System (ReFS)]]
+
# [http://simson.net/ref/2012/2012-08-08%20bulk_extractor%20Tutorial.pdf Using bulk_extractor for digital forensics triage and cross-drive analysis], DFRWS 2012
 
+
== Forensics ==
+
 
+
=== Partition layout ===
+
Default partition layout, first partition starts:
+
* at sector 63 in Windows 2000, XP, 2003
+
* at sector 2048 in Windows Vista, 2008, 7
+
 
+
=== Filesystems ===
+
* [[FAT]], [[FAT|exFAT]]
+
* [[NTFS]]
+
* [[Resilient File System (ReFS) | ReFS]]
+
 
+
=== Recycle Bin ===
+
 
+
==== RECYCLER ====
+
Used by Windows 2000, XP.
+
Uses INFO2 file.
+
 
+
See: [http://www.cybersecurityinstitute.biz/downloads/INFO2.pdf]
+
 
+
==== $RECYCLE.BIN ====
+
Used by Windows Vista.
+
Uses $I and $R files.
+
 
+
See: [http://www.forensicfocus.com/downloads/forensic-analysis-vista-recycle-bin.pdf]
+
 
+
=== Registry ===
+
 
+
The [[Windows Registry]] is a database of keys and values that provides a wealth of information to forensic [[investigator]]s.
+
 
+
=== Thumbs.db Files ===
+
 
+
[[Thumbs.db]] files can be found on many Windows systems. They contain thumbnails of images or documents and can be of great value for the [[investigator]].
+
 
+
See also: [[Vista thumbcache]].
+
 
+
=== Browser Cache ===
+
 
+
=== Browser History ===
+
 
+
The [[Web Browser History]] files can contain significant information. The default [[Web browser|web browser]] that comes with Windows is [[Internet Explorer|Microsoft Internet Explorer]] but other common browsers on Windows are [[Apple Safari]], [[Google Chrome]], [[Mozilla Firefox]] and [[Opera]].
+
 
+
=== Search ===
+
See [[Windows Desktop Search]]
+
 
+
=== Setup log files (setupapi.log) ===
+
Windows Vista introduced several setup log files [http://support.microsoft.com/kb/927521].
+
 
+
=== Sleep/Hibernation ===
+
 
+
After (at least) Windows 7 recovers from sleep/hibernation there often is a system time change event (event id 1) in the event logs.
+
 
+
=== Users ===
+
Windows stores a users Security identifiers (SIDs) under the following registry key:
+
<pre>
+
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList
+
</pre>
+
 
+
The %SID%\ProfileImagePath value should also contain the username.
+
 
+
=== Windows Error Reporting (WER) ===
+
 
+
As of Vista, for User Access Control (UAC) elevated applications WER reports can be found in:
+
<pre>
+
C:\ProgramData\Microsoft\Windows\WER\
+
</pre>
+
 
+
As of Vista, for non-UAC elevated applications (LUA) WER reports can be found in:
+
<pre>
+
C:\Users\%UserName%\AppData\Local\Microsoft\Windows\WER\
+
</pre>
+
 
+
Corresponding registry key:
+
<pre>
+
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting
+
</pre>
+
 
+
== Advanced Format (4KB Sector) Hard Drives ==
+
Windows XP does not natively handle drives that use the new standard of 4KB sectors. For information on this, see [[Advanced Format]].
+
 
+
== %SystemRoot% ==
+
The actual value of %SystemRoot% is store in the following registry value:
+
<pre>
+
Key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\
+
Value: SystemRoot
+
</pre>
+
 
+
== See Also ==
+
* [[Windows Event Log (EVT)]]
+
* [[Windows XML Event Log (EVTX)]]
+
* [[Windows Vista]]
+
* [[Windows 7]]
+
* [[Windows 8]]
+
 
+
== External Links ==
+
 
+
* [http://en.wikipedia.org/wiki/Microsoft_Windows Wikipedia: Microsoft Windows]
+
* [http://support.microsoft.com/kb/927521 Windows 7, Windows Server 2008 R2, and Windows Vista setup log file locations]
+
* [http://www.forensicfocus.com/downloads/forensic-analysis-vista-recycle-bin.pdf The Forensic Analysis of the Microsoft Windows Vista Recycle Bin], by [[Mitchell Machor]], 2008
+
* [http://www.ericjhuber.com/2013/02/microsoft-file-system-tunneling.html?m=1 Microsoft Windows File System Tunneling], by [[Eric Huber]], February 24, 2013
+
* [http://www.nsa.gov/ia/_files/app/Spotting_the_Adversary_with_Windows_Event_Log_Monitoring.pdf Spotting the Adversary with Windows Event Log Monitoring], by National Security Agency/Central Security Service, February 28, 2013
+
 
+
=== Malware/Rootkits ===
+
* [http://forensicmethods.com/inside-windows-rootkits Inside Windows Rootkits], by [[Chad Tilbury]], September 4, 2013
+
 
+
=== Program execution ===
+
* [http://windowsir.blogspot.com/2013/07/howto-determine-program-execution.html HowTo: Determine Program Execution], by [[Harlan Carvey]], July 06, 2013
+
* [http://journeyintoir.blogspot.com/2014/01/it-is-all-about-program-execution.html It Is All About Program Execution], by [[Corey Harrell]], January 14, 2014
+
* [http://sysforensics.org/2014/01/know-your-windows-processes.html Know your Windows Processes or Die Trying], by [[Patrick Olsen]], January 18, 2014
+
 
+
=== Tracking removable media ===
+
* [http://www.swiftforensics.com/2012/08/tracking-usb-first-insertion-in-event.html Tracking USB First insertion in Event logs], by Yogesh Khatri, August 18, 2012
+
 
+
=== Under the hood ===
+
* [http://msdn.microsoft.com/en-us/library/windows/desktop/aa366533(v=vs.85).aspx MSDN: Comparing Memory Allocation Methods], by [[Microsoft]]
+
* [http://blogs.msdn.com/b/ntdebugging/archive/2007/06/28/how-windows-starts-up-part-the-second.aspx How Windows Starts Up (Part the second)]
+
* [http://msdn.microsoft.com/en-us/library/aa375142.aspx DLL/COM Redirection]
+
* [http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx Dynamic-Link Library Search Order]
+
* [http://blogs.msdn.com/b/junfeng/archive/2004/04/28/121871.aspx Image File Execution Options]
+
 
+
==== MSI ====
+
* [http://blogs.msdn.com/b/heaths/archive/2009/02/02/changes-to-package-caching-in-windows-installer-5-0.aspx?Redirected=true Changes to Package Caching in Windows Installer 5.0], by Heath Stewart, February 2, 2009
+
* [http://blog.didierstevens.com/2013/07/26/msi-the-case-of-the-invalid-signature/ MSI: The Case Of The Invalid Signature], by Didier Stevens, July 26, 2013
+
 
+
==== Side-by-side (WinSxS) ====
+
* [http://en.wikipedia.org/wiki/Side-by-side_assembly Wikipedia: Side-by-side assembly]
+
* [http://msdn.microsoft.com/en-us/library/aa374224.aspx Assembly Searching Sequence]
+
* [http://blogs.msdn.com/b/junfeng/archive/2007/06/26/rt-manifest-resource-and-isolation-aware-enabled.aspx RT_MANIFEST resource, and ISOLATION_AWARE_ENABLED]
+
* [http://msdn.microsoft.com/en-us/library/windows/desktop/dd408052(v=vs.85).aspx Isolated Applications and Side-by-side Assemblies]
+
* [http://blogs.msdn.com/b/junfeng/archive/2006/01/24/517221.aspx#531208 DotLocal (.local) Dll Redirection], by [[Junfeng Zhang]], January 24, 2006
+
* [http://blogs.msdn.com/b/junfeng/archive/2006/04/14/576314.aspx Diagnosing SideBySide failures], by [[Junfeng Zhang]], April 14, 2006
+
* [http://omnicognate.wordpress.com/2009/10/05/winsxs/ EVERYTHING YOU NEVER WANTED TO KNOW ABOUT WINSXS]
+
 
+
==== Application Experience and Compatibility ====
+
* [http://technet.microsoft.com/en-us/library/dd837644(v=ws.10).aspx Technet: Understanding Shims], by [[Microsoft]]
+
* [http://msdn.microsoft.com/en-us/library/bb432182(v=vs.85).aspx MSDN: Application Compatibility Database], by [[Microsoft]]
+
* [http://www.alex-ionescu.com/?p=39 Secrets of the Application Compatilibity Database (SDB) – Part 1], by [[Alex Ionescu]], May 20, 2007
+
* [http://www.alex-ionescu.com/?p=40 Secrets of the Application Compatilibity Database (SDB) – Part 2], by [[Alex Ionescu]], May 21, 2007
+
* [https://dl.mandiant.com/EE/library/Whitepaper_ShimCacheParser.pdf Leveraging the Application Compatibility Cache in Forensic Investigations], by [[Andrew Davis]], May 4, 2012
+
* [http://journeyintoir.blogspot.ch/2013/12/revealing-recentfilecachebcf-file.html Revealing the RecentFileCache.bcf File], by [[Corey Harrell]], December 2, 2013
+
* [http://journeyintoir.blogspot.ch/2013/12/revealing-program-compatibility.html Revealing Program Compatibility Assistant HKCU AppCompatFlags Registry Keys], by [[Corey Harrell]], December 17, 2013
+
 
+
==== System Restore (Restore Points) ====
+
* [http://en.wikipedia.org/wiki/System_Restore Wikipedia: System Restore]
+
* [http://www.stevebunting.org/udpd4n6/forensics/restorepoints.htm Restore Point Forensics], by [[Steve Bunting]]
+
* [http://windowsir.blogspot.ch/2007/06/restore-point-analysis.html Restore Point Analysis], by [[Harlan Carvey]],  June 16, 2007
+
* [http://windowsir.blogspot.ch/2006/10/restore-point-forensics.html Restore Point Forensics], by [[Harlan Carvey]], October 20, 2006
+
* [http://www.ediscovery.co.nz/wip/srp.html System Restore Point Log Decoding]
+
 
+
==== Crash dumps ====
+
* [http://blogs.technet.com/b/yongrhee/archive/2010/12/29/drwtsn32-on-windows-vista-windows-server-2008-windows-7-windows-server-2008-r2.aspx Technet: Drwtsn32 on Windows Vista/Windows Server 2008/Windows 7/Windows Server 2008 R2], by Yong Rhee, December 29, 2010
+
* [http://support.microsoft.com/kb/315263 MSDN: How to read the small memory dump file that is created by Windows if a crash occurs], by [[Microsoft]]
+
 
+
==== RPC ====
+
* [http://blogs.technet.com/b/networking/archive/2008/10/24/rpc-to-go-v-1.aspx RPC to Go v.1], by Michael Platts, October 24, 2008
+
* [http://blogs.technet.com/b/networking/archive/2008/12/04/rpc-to-go-v-2.aspx RPC to Go v.2], by Michael Platts, December 4, 2008
+
 
+
==== User Account Control (UAC) ====
+
* [http://blog.strategiccyber.com/2014/03/20/user-account-control-what-penetration-testers-should-know/ User Account Control – What Penetration Testers Should Know], by Raphael Mudge, March 20, 2014
+
 
+
==== Windows Scripting Host ====
+
* [https://www.mandiant.com/blog/ground-windows-scripting-host-wsh/ Going To Ground with The Windows Scripting Host (WSH)], by Devon Kerr, February 19, 2014
+
 
+
==== USB ====
+
* [https://blogs.sans.org/computer-forensics/files/2009/09/USBKEY-Guide.pdf USBKEY Guide], by [[SANS | SANS Institute - Digital Forensics and Incident Response]], September 2009
+
* [https://blogs.sans.org/computer-forensics/files/2009/09/USB_Drive_Enclosure-Guide.pdf USB Drive Enclosure Guide], by [[SANS | SANS Institute - Digital Forensics and Incident Response]], September 2009
+
 
+
==== WMI ====
+
* [http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/white-papers/wp__understanding-wmi-malware.pdf Understanding WMI Malware], by Julius Dizon, Lennard Galang, and Marvin Cruz, July 2010
+
 
+
==== Windows Error Reporting (WER) ====
+
* [http://blogs.technet.com/b/yongrhee/archive/2010/12/29/drwtsn32-on-windows-vista-windows-server-2008-windows-7-windows-server-2008-r2.aspx Drwtsn32 on Windows Vista/Windows Server 2008/Windows 7/Windows Server 2008 R2], by Yong Rhee, December 29, 2010
+
* [http://journeyintoir.blogspot.ch/2014/02/exploring-windows-error-reporting.html Exploring Windows Error Reporting], by [[Corey Harrell]], February 24, 2014
+
 
+
==== Windows Firewall ====
+
* [http://en.wikipedia.org/wiki/Windows_Firewall Wikipedia: Windows Firewall]
+
* [http://technet.microsoft.com/en-us/library/cc737845(v=ws.10).aspx#BKMK_log Windows Firewall Tools and Settings]
+
 
+
==== Windows 32-bit on Windows 64-bit (WoW64) ====
+
* [http://en.wikipedia.org/wiki/WoW64 Wikipedia: WoW64]
+
 
+
=== Windows XP ===
+
* [http://support.microsoft.com/kb/q308549 Description of Windows XP System Information (Msinfo32.exe) Tool]
+
 
+
[[Category:Operating systems]]
+

Latest revision as of 16:24, 19 May 2014

Overview

bulk_extractor is a computer forensics tool that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures. The results can be easily inspected, parsed, or processed with automated tools. bulk_extractor also created a histograms of features that it finds, as features that are more common tend to be more important. The program can be used for law enforcement, defense, intelligence, and cyber-investigation applications.

bulk_extractor is distinguished from other forensic tools by its speed and thoroughness. Because it ignores file system structure, bulk_extractor can process different parts of the disk in parallel. In practice, the program splits the disk up into 16MiByte pages and processes one page on each available core. This means that 24-core machines process a disk roughly 24 times faster than a 1-core machine. bulk_extractor is also thorough. That’s because bulk_extractor automatically detects, decompresses, and recursively re-processes compressed data that is compressed with a variety of algorithms. Our testing has shown that there is a significant amount of compressed data in the unallocated regions of file systems that is missed by most forensic tools that are commonly in use today.

Another advantage of ignoring file systems is that bulk_extractor can be used to process any digital media. We have used the program to process hard drives, SSDs, optical media, camera cards, cell phones, network packet dumps, and other kinds of digital information.

Output Feature Files

bulk_extractor now creates an output directory that includes:

  • ccn.txt -- Credit card numbers
  • ccn_track2.txt -- Credit card “track 2″ information
  • domain.txt -- Internet domains found on the drive, including dotted-quad addresses found in text.
  • email.txt -- Email addresses
  • ether.txt -- Ethernet MAC addresses found through IP packet carving of swap files and compressed system hibernation files and file fragments.
  • exif.txt -- EXIFs from JPEGs and video segments. This feature file contains all of the EXIF fields, expanded as XML records.
  • find.txt -- The results of specific regular expression search requests.
  • ip.txt -- IP addresses found through IP packet carving.
  • telephone.txt --- US and international telephone numbers.
  • url.txt --- URLs, typically found in browser caches, email messages, and pre-compiled into executables.
  • url_searches.txt --- A histogram of terms used in Internet searches from services such as Google, Bing, Yahoo, and others.
  • wordlist.txt --- :A list of all “words” extracted from the disk, useful for password cracking.
  • wordlist_*.txt --- The wordlist with duplicates removed, formatted in a form that can be easily imported into a popular password-cracking program.
  • zip.txt --- A file containing information regarding every ZIP file component found on the media. This is exceptionally useful as ZIP files contain internal structure and ZIP is increasingly the compound file format of choice for a variety of products such as Microsoft Office

For each of the above, two additional files may be created:

  • *_stopped.txt --- bulk_extractor supports a stop list, or a list of items that do not need to be brought to the user’s attention. However rather than simply suppressing this information, which might cause something critical to be hidden, stopped entries are stored in the stopped files.
  • *_histogram.txt --- bulk_extractor can also create histograms of features. This is important, as experience has shown that email addresses, domain names, URLs, and other information that appear more frequently on a hard drive or in a cell phone’s memory can be used to rapidly create a pattern of life report.

Bulk extractor also creates a file that captures the provenance of the run:

report.xml
A Digital Forensics XML report that includes information about the source media, how the bulk_extractor program was compiled and run, the time to process the digital evidence, and a meta report of the information that was found.

Post-Processing

We have developed four programs for post-processing the bulk_extractor output:

bulk_diff.py
This program reports the differences between two bulk_extractor runs. The intent is to image a computer, run bulk_extractor on a disk image, let the computer run for a period of time, re-image the computer, run bulk_extractor on the second image, and then report the differences. This can be used to infer the user’s activities within a time period.
cda_tool.py
This tool, currently under development, reads multiple bulk_extractor reports from multiple runs against multiple drives and performs a multi-drive correlation using Garfinkel’s Cross Drive Analysis technique. This can be used to automatically identify new social networks or to identify new members of existing networks.
identify_filenames.py
In the bulk_extractor feature file, each feature is annotated with the byte offset from the beginning of the image in which it was found. The program takes as input a bulk_extractor feature file and a DFXML file containing the locations of each file on the drive (produced with Garfinkel’s fiwalk program) and produces an annotated feature file that contains the offset, feature, and the file in which the feature was found.
make_context_stop_list.py
Although forensic analysts frequently make “stop lists”—for example, a lsit of email addresses that appear in the operating system and should therefore be ignored—such lists have a significant problem. Because it is relatively easy to get an email address into the binary of an open source application, ignoring all of these email addresses may make it possible to cloak email addresses from forensic analysis. Our solution is to create context-sensitive stop lists, in which the feature to be stopped is presented with the context in which it occures. The make_context_stop_list.py program takes the results of multiple bulk_extractor runs and creates a single context-sensitive stop list that can then be used to suppress features when found in a specific context. One such stop list constructed from Windows and Linux operating systems is available on the bulk extractor website.

Download

The current version of bulk_extractor is 1.4.4.

Bibliography

Academic Publications

  1. Garfinkel, Simson, Digital media triage with bulk data analysis and bulk_extractor. Computers and Security 32: 56-72 (2013)
  2. Beverly, Robert, Simson Garfinkel and Greg Cardwell, "Forensic Carving of Network Packets and Associated Data Structures", DFRWS 2011, Aug. 1-3, 2011, New Orleans, LA. BEST PAPER AWARD (Acceptance rate: 23%, 14/62)
  3. Garfinkel, S., Forensic Feature Extraction and Cross-Drive Analysis,The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006. (Acceptance rate: 43%, 16/37)

YouTube

search YouTube for bulk_extractor videos

Tutorials

  1. Using bulk_extractor for digital forensics triage and cross-drive analysis, DFRWS 2012