Difference between pages "Bulk extractor" and "Plaso"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m
 
(File formats)
 
Line 1: Line 1:
== Overview ==
+
{{Infobox_Software |
'''bulk_extractor''' is a computer forensics tool that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures. The results can be easily inspected, parsed, or processed with automated tools. '''bulk_extractor''' also created a histograms of features that it finds, as features that are more common tend to be more important. The program can be used for law enforcement, defense, intelligence, and cyber-investigation applications.
+
  name = plaso |
 +
  maintainer = [[Kristinn Gudjonsson]], [[Joachim Metz]] |
 +
  os = [[Linux]], [[Mac OS X]], [[Windows]] |
 +
  genre = {{Analysis}} |
 +
  license = {{APL}} |
 +
  website = [https://code.google.com/p/plaso/ code.google.com/p/plaso/] |
 +
}}
  
bulk_extractor is distinguished from other forensic tools by its speed and thoroughness. Because it ignores file system structure, bulk_extractor can process different parts of the disk in parallel. In practice, the program splits the disk up into 16MiByte pages and processes one page on each available core. This means that 24-core machines process a disk roughly 24 times faster than a 1-core machine. bulk_extractor is also thorough. That’s because bulk_extractor automatically detects, decompresses, and recursively re-processes compressed data that is compressed with a variety of algorithms. Our testing has shown that there is a significant amount of compressed data in the unallocated regions of file systems that is missed by most forensic tools that are commonly in use today.
+
Plaso (plaso langar að safna öllu) is the Python based back-end engine used by tools such as log2timeline for automatic creation of a super timelines. The goal of log2timeline (and thus plaso) is to provide a single tool that can parse various log files and forensic artifacts from computers and related systems, such as network equipment to produce a single correlated timeline. This timeline can then be easily analysed by forensic investigators/analysts, speeding up investigations by correlating the vast amount of information found on an average computer system. Plaso is intended to be applied for creating super timelines but also supports creating [http://blog.kiddaland.net/2013/02/targeted-timelines-part-i.html targeted timelines].
  
Another advantage of ignoring file systems is that bulk_extractor can be used to process any digital media. We have used the program to process hard drives, SSDs, optical media, camera cards, cell phones, network packet dumps, and other kinds of digital information.
+
The Plaso project site also provides [[4n6time]], formerly "l2t_Review", which is a cross-platform forensic tool for timeline creation and review by [[David Nides]].
  
==Output Feature Files==
+
== Supported Formats ==
 +
The information below is based of version 1.1.0
  
bulk_extractor now creates an output directory that includes:
+
=== Storage Media Image File Formats ===
* '''ccn.txt''' -- Credit card numbers
+
Storage Medis Image File Format support is provided by [[dfvfs]].
* '''ccn_track2.txt''' -- Credit card “track 2″ information
+
* '''domain.txt''' -- Internet domains found on the drive, including dotted-quad addresses found in text.
+
* '''email.txt''' -- Email addresses
+
* '''ether.txt''' -- Ethernet MAC addresses found through IP packet carving of swap files and compressed system hibernation files and file fragments.
+
* '''exif.txt''' -- EXIFs from JPEGs and video segments. This feature file contains all of the EXIF fields, expanded as XML records.
+
* '''find.txt''' -- The results of specific regular expression search requests.
+
* '''ip.txt''' -- IP addresses found through IP packet carving.
+
* '''telephone.txt''' --- US and international telephone numbers.
+
* '''url.txt''' --- URLs, typically found in browser caches, email messages, and pre-compiled into executables.
+
* '''url_searches.txt''' --- A histogram of terms used in Internet searches from services such as Google, Bing, Yahoo, and others.
+
* '''wordlist.txt''' --- :A list of all “words” extracted from the disk, useful for password cracking.
+
* '''wordlist_*.txt''' --- The wordlist with duplicates removed, formatted in a form that can be easily imported into a popular password-cracking program.
+
* '''zip.txt''' --- A file containing information regarding every ZIP file component found on the media. This is exceptionally useful as ZIP files contain internal structure and ZIP is increasingly the compound file format of choice for a variety of products such as Microsoft Office
+
  
For each of the above, two additional files may be created:
+
=== Volume System Formats ===
* '''*_stopped.txt''' --- bulk_extractor supports a stop list, or a list of items that do not need to be brought to the user’s attention. However rather than simply suppressing this information, which might cause something critical to be hidden, stopped entries are stored in the stopped files.
+
Volume System Format support is provided by [[dfvfs]].
* '''*_histogram.txt''' --- bulk_extractor can also create histograms of features. This is important, as experience has shown that email addresses, domain names, URLs, and other information that appear more frequently on a hard drive or in a cell phone’s memory can be used to rapidly create a pattern of life report.
+
  
Bulk extractor also creates a file that captures the provenance of the run:
+
=== File System Formats ===
;report.xml
+
File System Format support is provided by [[dfvfs]].
:A Digital Forensics XML report that includes information about the source media, how the bulk_extractor program was compiled and run, the time to process the digital evidence, and a meta report of the information that was found.
+
  
==Post-Processing==
+
=== File formats ===
 +
* Apple System Log (ASL)
 +
* Basic Security Module (BSM)
 +
* Bencode files
 +
* [[Google Chrome|Chrome cache files]]
 +
* CUPS IPP
 +
* [[Extensible Storage Engine (ESE) Database File (EDB) format]] using [[libesedb]]
 +
* Firefox Cache
 +
* Java IDX
 +
* MacOS-X Application firewall
 +
* MacOS-X Keychain
 +
* MacOS-X Securityd
 +
* MacOS-X Wifi
 +
* ([[SleuthKit]]) mactime logs
 +
* McAfee Anti-Virus Logs
 +
* Microsoft [[Internet Explorer History File Format]] (also known as MSIE 4 - 9 Cache Files or index.dat) using [[libmsiecf]]
 +
* [[OLE Compound File]] using [[libolecf]]
 +
* [[Opera|Opera Browser history]]
 +
* OpenXML
 +
* Pcap files
 +
* Popularity Contest log
 +
* [[Property list (plist)|Property list (plist) format]] using [[binplist]]
 +
* SELinux audit logs
 +
* SkyDrive log and error log files
 +
* [[SQLite database format]] using [[SQLite]]
 +
* Symantec AV Corporate Edition and Endpoint Protection log
 +
* Syslog
 +
* UTMP
 +
* UTMPX
 +
* [[Windows Event Log (EVT)]] using [[libevt]]
 +
* Windows Firewall
 +
* Windows Job files (also known as "at jobs")
 +
* Windows Prefetch files
 +
* Windows Recycle bin (INFO2 and $I/$R)
 +
* [[Windows NT Registry File (REGF)]] using [[libregf]]
 +
* [[LNK|Windows Shortcut File (LNK) format]] using [[liblnk]]
 +
* [[Windows XML Event Log (EVTX)]] using [[libevtx]]
 +
* Xchat and Xchat scrollback files
  
We have developed four programs for post-processing the bulk_extractor output:
+
=== Bencode file formats ===
;bulk_diff.py
+
* Transmission
:This program reports the differences between two bulk_extractor runs. The intent is to image a computer, run bulk_extractor on a disk image, let the computer run for a period of time, re-image the computer, run bulk_extractor on the second image, and then report the differences. This can be used to infer the user’s activities within a time period.
+
* uTorrent
;cda_tool.py
+
:This tool, currently under development, reads multiple bulk_extractor reports from multiple runs against multiple drives and performs a multi-drive correlation using Garfinkel’s Cross Drive Analysis technique. This can be used to automatically identify new social networks or to identify new members of existing networks.
+
;identify_filenames.py
+
:In the bulk_extractor feature file, each feature is annotated with the byte offset from the beginning of the image in which it was found. The program takes as input a bulk_extractor feature file and a DFXML file containing the locations of each file on the drive (produced with Garfinkel’s fiwalk program) and produces an annotated feature file that contains the offset, feature, and the file in which the feature was found.
+
;make_context_stop_list.py
+
:Although forensic analysts frequently make “stop lists”—for example, a lsit of email addresses that appear in the operating system and should therefore be ignored—such lists have a significant problem. Because it is relatively easy to get an email address into the binary of an open source application, ignoring all of these email addresses may make it possible to cloak email addresses from forensic analysis. Our solution is to create context-sensitive stop lists, in which the feature to be stopped is presented with the context in which it occures. The make_context_stop_list.py program takes the results of multiple bulk_extractor runs and creates a single context-sensitive stop list that can then be used to suppress features when found in a specific context. One such stop list constructed from Windows and Linux operating systems is available on the bulk extractor website.
+
  
== Download ==
+
=== ESE database file formats ===
The current version of '''bulk_extractor''' is 1.3.
+
* Internet Explorer WebCache format
  
* Downloads are available at: http://digitalcorpora.org/downloads/bulk_extractor/
+
=== OLE Compound File formats ===
* A WIndows installer with the GUI can be downloaded from: http://digitalcorpora.org/downloads/bulk_extractor/executables/be_installer-1.3.exe
+
* Document summary information
 +
* Summary information (top-level only)
  
== Bibliography ==
+
=== Property list (plist) formats ===
=== Academic Publications ===
+
* Airport
# Garfinkel, Simson, [http://simson.net/clips/academic/2013.COSE.bulk_extractor.pdf Digital media triage with bulk data analysis and bulk_extractor]. Computers and Security 32: 56-72 (2013)
+
* Apple Account
# Beverly, Robert, Simson Garfinkel and Greg Cardwell, [http://simson.net/clips/academic/2011.DFRWS.ipcarving.pdf "Forensic Carving of Network Packets and Associated Data Structures"], DFRWS 2011, Aug. 1-3, 2011, New Orleans, LA. BEST PAPER AWARD (Acceptance rate: 23%, 14/62)
+
* Bluetooth
#Garfinkel, S., [http://simson.net/clips/academic/2006.DFRWS.pdf Forensic Feature Extraction and Cross-Drive Analysis,]The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006. (Acceptance rate: 43%, 16/37)
+
* Install History
 +
* iPod/iPhone
 +
* Mac User
 +
* Safari history
 +
* Software Update
 +
* Spotlight
 +
* Spotlight Volume Information
 +
* Timemachine
  
===YouTube===
+
=== SQLite database file formats ===
'''[http://www.youtube.com/results?search_query=bulk_extractor search YouTube] for bulk_extractor videos'''
+
* Android call logs
* [http://www.youtube.com/watch?v=odvDTGA7rYI Simson Garfinkel speaking at CERIAS about bulk_extractor]
+
* Android SMS
* [http://www.youtube.com/watch?v=wTBHM9DeLq4 BackTrack 5 with bulk_extractor]
+
* Chrome cookies
* [http://www.youtube.com/watch?v=QVfYOvhrugg Ubuntu 12.04 forensics with bulk_extractor]
+
* [[Google Chrome|Chrome browsing and downloads history]]
* [http://www.youtube.com/watch?v=57RWdYhNvq8 Social Network forensics with bulk_extractor]
+
* [[Mozilla Firefox|Firefox browsing and downloads history]]
 +
* Google Drive
 +
* Launch services quarantine events
 +
* MacKeeper cache
 +
* Mac OS X document versions
 +
* Skype text conversations
 +
* [[Zeitgeist|Zeitgeist activity database]]
  
===Tutorials===
+
=== [[Windows Registry]] formats ===
# [http://simson.net/ref/2012/2012-08-08%20bulk_extractor%20Tutorial.pdf Using bulk_extractor for digital forensics triage and cross-drive analysis], DFRWS 2012
+
* [[Windows Application Compatibility|AppCompatCache]]
 +
* CCleaner
 +
* Less Frequently Used
 +
* MountPoints2
 +
* MRUList and MRUListEx (no shell item support)
 +
* [[Internet Explorer|MSIE Zones]]
 +
* Office MRU
 +
* Outlook Search
 +
* Run Keys
 +
* Services
 +
* Terminal Server MRU
 +
* Typed URLS
 +
* USBStor
 +
* UserAssist
 +
* WinRar
 +
* Windows version information
 +
 
 +
== History ==
 +
Plaso is a Python-based rewrite of the Perl-based [[log2timeline]] initially created by [[Kristinn Gudjonsson]]. Plaso builds upon the [[SleuthKit]], [[libyal]], [[dfvfs]] and various other projects.
 +
 
 +
== See Also ==
 +
* [[dfvfs]]
 +
* [[log2timeline]]
 +
 
 +
== External Links ==
 +
* [https://code.google.com/p/plaso/ Project site]
 +
* [https://sites.google.com/a/kiddaland.net/plaso/home Project documentation]
 +
* [http://blog.kiddaland.net/ Project blog]
 +
* [https://sites.google.com/a/kiddaland.net/plaso/usage/4n6time 4n6time]

Revision as of 00:48, 4 June 2014

plaso
Maintainer: Kristinn Gudjonsson, Joachim Metz
OS: Linux, Mac OS X, Windows
Genre: Analysis
License: APL
Website: code.google.com/p/plaso/

Plaso (plaso langar að safna öllu) is the Python based back-end engine used by tools such as log2timeline for automatic creation of a super timelines. The goal of log2timeline (and thus plaso) is to provide a single tool that can parse various log files and forensic artifacts from computers and related systems, such as network equipment to produce a single correlated timeline. This timeline can then be easily analysed by forensic investigators/analysts, speeding up investigations by correlating the vast amount of information found on an average computer system. Plaso is intended to be applied for creating super timelines but also supports creating targeted timelines.

The Plaso project site also provides 4n6time, formerly "l2t_Review", which is a cross-platform forensic tool for timeline creation and review by David Nides.

Supported Formats

The information below is based of version 1.1.0

Storage Media Image File Formats

Storage Medis Image File Format support is provided by dfvfs.

Volume System Formats

Volume System Format support is provided by dfvfs.

File System Formats

File System Format support is provided by dfvfs.

File formats

Bencode file formats

  • Transmission
  • uTorrent

ESE database file formats

  • Internet Explorer WebCache format

OLE Compound File formats

  • Document summary information
  • Summary information (top-level only)

Property list (plist) formats

  • Airport
  • Apple Account
  • Bluetooth
  • Install History
  • iPod/iPhone
  • Mac User
  • Safari history
  • Software Update
  • Spotlight
  • Spotlight Volume Information
  • Timemachine

SQLite database file formats

Windows Registry formats

  • AppCompatCache
  • CCleaner
  • Less Frequently Used
  • MountPoints2
  • MRUList and MRUListEx (no shell item support)
  • MSIE Zones
  • Office MRU
  • Outlook Search
  • Run Keys
  • Services
  • Terminal Server MRU
  • Typed URLS
  • USBStor
  • UserAssist
  • WinRar
  • Windows version information

History

Plaso is a Python-based rewrite of the Perl-based log2timeline initially created by Kristinn Gudjonsson. Plaso builds upon the SleuthKit, libyal, dfvfs and various other projects.

See Also

External Links