Difference between pages "Dd" and "Category:Digital Forensics XML"

From Forensics Wiki
(Difference between pages)
Jump to: navigation, search
(Use extreme caution if reading from a tape drive)
 
(Note non-draft status)
 
Line 1: Line 1:
{{Infobox_Software |
+
''Digital Forensics XML'' (DFXML) is the effort to create an XML schema to allow for easy interoperability between different forensic tools.  
  name = dd |
+
  maintainer = [[Paul Rubin]], [[David MacKenzie]], [[Stuart Kemp]] |
+
  os = {{Linux}}, {{Windows}}, {{Mac OS X}} |
+
  genre = {{Disk imaging}} |
+
  license = {{GPL}} |
+
  website = [ftp://ftp.gnu.org/gnu/coreutils/ ftp.gnu.org/gnu/coreutils/] |
+
}}
+
  
'''dd''', sometimes called '''GNU dd''', is the oldest [[Tools#Disk_Imaging_Tools|imaging tool]] still used. Although it is functional and requires only minimal resources to run, it lacks some of the useful features found in more modern imagers such as [[metadata]] gathering, error correction, piecewise hashing, and a user-friendly interface. dd is a command line program that uses several obscure command line arguments to control the imaging process. Because some of these flags are similar and, if confused, can destroy the source media the examiner is trying to duplicate, users should be careful when running this program. The program generates [[Raw image file|raw image files]] which can be read by many other programs.
+
There is a [https://github.com/dfxml-working-group/dfxml_schema Digital Forensics XML standard schema] that lets one validate a DFXML document with the xmllint utility.
  
dd is part of the [[GNU Coreutils]] package which in turn has been ported to many [[Operating system|operating systems]].  
+
Development on DFXML to date has involved creating a set of tools that can produce or ingest XML with a common set of tags. It would be nice to have a more aggressive effort, but to date there has not been sufficient funding.
  
There are a few forks of dd for forensic purposes including [[dcfldd]], [[sdd]], [[dd_rescue]], [[ddrescue]], [[dccidd]], and a [[Windows|Microsoft Windows]] version that supports reading [[physical memory]].
+
Given this state of affairs, our current strategy is to:
  
== Example ==
+
* Develop a set of standardized tags and data representations for current XML tools.
 +
* Modify our tools to produce XML similar to the sample XML.
 +
* Develop a DTD and schema to allow XML validation.
  
Here are two common dd command lines:
+
==Tools==
  
'''UNIX/Linux'''
+
===Tools that produce DFXML===
 +
If you want to work with DFXML, you may wish to start with the [https://github.com/simsong/dfxml DFXML package on github].
  
dd if=/dev/hda of=mybigfile.img bs=65536 conv=noerror,sync
+
The following tools are known to produce DFXML:
 +
* The [[fiwalk]] C++ program produces DFXML for files from disk images using SleuthKit.
 +
* [[frag_find]], the hash-based carver, uses DFXML to document where files physically reside in the disk image.
 +
* [[photorec]], the popular carver, uses DFXML to document its configuration and where files physically reside in the disk image.
 +
* [[bulk_extractor]] uses DFXML to report the configuration of each run and the provenance of the input files.
 +
* [[afxml]], part of AFFLIB, converts metadata for disk images into DFXML format.
 +
* [[libewf | ewfinfo]], part of libewf, can output metadata for EWF disk images in DFXML format.
 +
* [[md5deep]], [[sha1deep]], [[hashdeep]], and the other programs in the md5deep package produce DFXML hash files when provided with the '''-d''' option. .
  
'''Windows'''
+
The following tools are known to consume DFXML:
  
dd.exe if=\\.\PhysicalDrive0 of=d:\images\PhysicalDrive0.img --md5sum --verifymd5
+
* [https://github.com/simsong/dfxml dfxml.py and libdfxml], currently available for download on github.
--md5out=d:\images\PhysicalDrive0.img.md5
+
  
== Tips ==
+
===Tools that consume DFXML===
With linux in addition to
+
* [[frag_find]], the hash-based carver, will be able to read piecewise hash files in DFXML format.
dd if=/dev/hda of=mybigfile.img bs=65536 conv=noerror,sync
+
* iblkfind.py, part of the [[fiwalk]] distribution, will report the file associated with any disk sector.
 +
* identify_filenames.py, part of the [[bulk_extractor]] distribution, will take a bulk_extractor feature file and annotate it with the names of the files from which each feature was extracted.
 +
* idifference.py, part of the [[fiwalk]] distribution, will report the difference between two disk images.
 +
* imap.py, part of the [[fiwalk]] distribution, will draw a map of what's on a disk. Only useful for small partitions.
 +
* imicrosoft_redact.py, part of the [[fiwalk]] distribution, will break the Microsoft binaries in a disk image.
 +
* iverify.py, part of the [[fiwalk]] distribution, will verify that the contents of files in a DFXML file haven't been changed.
 +
* The [https://github.com/anarchivist/dfxml dfxml gem for Ruby], mostly used to process [[fiwalk]] output
 +
* [https://github.com/anarchivist/gumshoe Gumshoe], a Ruby/Solr-based search interface for metadata extracted from disk images
  
You can wipe a drive with:
+
===Tools that transform DFXML===
dd if=/dev/zero of=/dev/hda bs=4K conv=noerror,sync
+
* sanitize_xml.py, part of the fiwalk distribution, will remove personally-identifiable information in filenames and directory names from a DFXML file.
  
For imaging a useful alternate invocation in Linux or UNIX is:
 
dd if=/dev/hda bs=4K conv=sync,noerror | tee mybigfile.img | md5sum > mybigfile.md5
 
  
The above alternate imaging command uses dd to read the harddrive being imaged and outputs the data to tee. tee saves a copy of the data as your image file and also outputs a copy of the data to md5sum. md5sum calculates the hash which gets saved in mybgifile.md5
+
===DFXML Toolkit===
 +
The following toolkits are useful for building new tools that read and write DFXML:
 +
* The dfxml.py Python module implements objects for reading and writing DFXML.
 +
* The xml.cpp and xml.h files that are included in the bulk_extractor and md5deep (version 4) source code are a good C++ implementation for DFXML generation.
 +
* The xml.c and xml.h files that are included in the photorec (new version) source code are a good C implementation for DFXML generation.
  
For all of the above
+
This toolkit is now available as a git project on github at https://github.com/simsong/dfxml
if            => input file
+
/dev/hda      => the linux name of a physical disk. Mac has their own names.
+
/dev/zero      => in linux, this is an infinite source of nulls
+
of            => output file
+
mybigfile.img  => The name of the image file you are creating
+
bs            => [[blocksize]]
+
65536          => 64K  (I normally use 4K in linux.  That is what the linux kernel uses as a page size.)
+
noerror        => don't die if you have a read error from the source drive
+
sync          => if there is an error, null fill the rest of the block.
+
  
In linux, the blocksize value can have a multiplicative suffix: 
+
===XML Forensics Tools and Toolkits===
c =1
+
* We are creating a DFXML strategy for distributing hash sets.
w =2
+
b =512
+
kB =1000,          K =1024
+
MB =1000*1000,      M =1024*1024
+
GB =1000*1000*1000, G =1024*1024*1024
+
and so on for T, P, E, Z, Y.  
+
  
Things to know:
+
==DFXML Bibliography==
 +
===Papers===
 +
#Garfinkel, S. [http://simson.net/clips/academic/2012.DI.dfxml.pdf Digital Forensics XML and the DFXML toolset], Digital Investigation, 2012.
 +
#Garfinkel, Simson., [http://simson.net/clips/academic/2009.SADFE.xml_forensics.pdf Automating Disk Forensic Processing with SleuthKit, XML and Python], Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009), Oakland, California. (Acceptance rate: 32%, 7/22)
 +
===Presentations===
 +
# [http://simson.net/ref/2011/2011-12-07%20DFXML.pdf Digital Forensic Tool Integration], Simson Garfinkel, December 7, 2011
  
Having a bigger blocksize is more efficient, but if you use a 1MB block as an example and have a read error in the first sector, then dd will null fill the entire MB.  Thus you should use as small a blocksize as feasible.
+
==See Also==
 
+
* [[fiwalk]]
But with linux if you go below 4KB blocksize, you can hit really bad performance issues.  It can be as much as 10x slower to use the default 512 byte block as it is to use a 4KB block. 
+
* [http://www.cgsecurity.org/wiki/Data_Carving_Log XML Log Sample for photorec]
 
+
* [http://mark0.net/soft-tridscan-e.html TrIDScan], which has an XML language to describe file types.
Without noerror and sync, you basically don't have a forensic image.  For forensic images they are mandatory.
+
* [https://github.com/simsong/dfxml DFXML toolkit on Github]
 
+
[[Category:Top-Level]]
dd by itself does not hash, that is why the alternate command is provided.
+
 
+
== Cautions ==
+
=== Reversing Args can cause evidence erasure ===
+
Use extreme care when typing the command line for this program. Reversing the <tt>if</tt> and <tt>of</tt> flags will cause the computer to erase your evidence!
+
 
+
=== Use extreme caution if reading from a tape drive ===
+
At least with Linux/UNIX, tape drives have functional differences from disk that make them more complex to "image".  Specifically they have EOF and EOT markings on the tape media that do not have a corresponding functionality with disks. 
+
 
+
Most commercial backup software use EOF separators to allow a single tape to hold multiple backup sessions.
+
 
+
backup1-- EOF -- backup2 -- EOF -- backup3 -- EOT
+
 
+
A simple dd if=/dev/st0 of=image.dd will only preserve the first backup session.
+
 
+
For testing, from Linux you can create a multi-session backup tape via:
+
 
+
mt rewind -f /dev/st0
+
tar -cf /dev/nst0 /home
+
tar -cf /dev/nst0 /srv
+
 
+
The nst device driver considers the closing of /dev/nst0 to signal the
+
end of a tape file, so it appends a EOF mark after each invocation of
+
tar.
+
 
+
So the tape would have:
+
home_tar_archive -- EOF -- srv_tar_archive -- EOF -- EOT
+
 
+
If you start reading from the start of the tape with either dd or tar,
+
they will stop when the first EOF is hit and thus will only extract the home archive and will miss the srv archive.
+
 
+
== See also ==
+
 
+
* [[aimage]]
+
* [[Blackbag]]
+
* [[dc3dd]]
+
* [[dcfldd]]
+
* [[dd_rescue]]
+
* [[ddrescue]]
+
* [[sdd]]
+
* [[sg_dd]]
+
* [[mdd]]
+
 
+
== External Links ==
+
 
+
* [http://www.linuxjournal.com/article/1320 LinuxJournal article about dd]
+
* [http://users.erols.com/gmgarner/forensics/ Windows Version of dd and other forensics tools]
+

Latest revision as of 13:13, 22 November 2013

Digital Forensics XML (DFXML) is the effort to create an XML schema to allow for easy interoperability between different forensic tools.

There is a Digital Forensics XML standard schema that lets one validate a DFXML document with the xmllint utility.

Development on DFXML to date has involved creating a set of tools that can produce or ingest XML with a common set of tags. It would be nice to have a more aggressive effort, but to date there has not been sufficient funding.

Given this state of affairs, our current strategy is to:

  • Develop a set of standardized tags and data representations for current XML tools.
  • Modify our tools to produce XML similar to the sample XML.
  • Develop a DTD and schema to allow XML validation.

Contents

Tools

Tools that produce DFXML

If you want to work with DFXML, you may wish to start with the DFXML package on github.

The following tools are known to produce DFXML:

  • The fiwalk C++ program produces DFXML for files from disk images using SleuthKit.
  • frag_find, the hash-based carver, uses DFXML to document where files physically reside in the disk image.
  • photorec, the popular carver, uses DFXML to document its configuration and where files physically reside in the disk image.
  • bulk_extractor uses DFXML to report the configuration of each run and the provenance of the input files.
  • afxml, part of AFFLIB, converts metadata for disk images into DFXML format.
  • ewfinfo, part of libewf, can output metadata for EWF disk images in DFXML format.
  • md5deep, sha1deep, hashdeep, and the other programs in the md5deep package produce DFXML hash files when provided with the -d option. .

The following tools are known to consume DFXML:

Tools that consume DFXML

  • frag_find, the hash-based carver, will be able to read piecewise hash files in DFXML format.
  • iblkfind.py, part of the fiwalk distribution, will report the file associated with any disk sector.
  • identify_filenames.py, part of the bulk_extractor distribution, will take a bulk_extractor feature file and annotate it with the names of the files from which each feature was extracted.
  • idifference.py, part of the fiwalk distribution, will report the difference between two disk images.
  • imap.py, part of the fiwalk distribution, will draw a map of what's on a disk. Only useful for small partitions.
  • imicrosoft_redact.py, part of the fiwalk distribution, will break the Microsoft binaries in a disk image.
  • iverify.py, part of the fiwalk distribution, will verify that the contents of files in a DFXML file haven't been changed.
  • The dfxml gem for Ruby, mostly used to process fiwalk output
  • Gumshoe, a Ruby/Solr-based search interface for metadata extracted from disk images

Tools that transform DFXML

  • sanitize_xml.py, part of the fiwalk distribution, will remove personally-identifiable information in filenames and directory names from a DFXML file.


DFXML Toolkit

The following toolkits are useful for building new tools that read and write DFXML:

  • The dfxml.py Python module implements objects for reading and writing DFXML.
  • The xml.cpp and xml.h files that are included in the bulk_extractor and md5deep (version 4) source code are a good C++ implementation for DFXML generation.
  • The xml.c and xml.h files that are included in the photorec (new version) source code are a good C implementation for DFXML generation.

This toolkit is now available as a git project on github at https://github.com/simsong/dfxml

XML Forensics Tools and Toolkits

  • We are creating a DFXML strategy for distributing hash sets.

DFXML Bibliography

Papers

  1. Garfinkel, S. Digital Forensics XML and the DFXML toolset, Digital Investigation, 2012.
  2. Garfinkel, Simson., Automating Disk Forensic Processing with SleuthKit, XML and Python, Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009), Oakland, California. (Acceptance rate: 32%, 7/22)

Presentations

  1. Digital Forensic Tool Integration, Simson Garfinkel, December 7, 2011

See Also