Difference between revisions of "Category:Digital Forensics XML"

From ForensicsWiki
Jump to: navigation, search
m (See Also)
(Note draft schema)
(18 intermediate revisions by 4 users not shown)
Line 1: Line 1:
''Digital Forensics XML'' is the effort to create an XML schema to allow for easy interoperability between different forensic tools.  
+
''Digital Forensics XML'' (DFXML) is the effort to create an XML schema to allow for easy interoperability between different forensic tools.  
 +
 
 +
Currently there is a [https://github.com/dfxml-working-group/dfxml_schema draft Digital Forensics XML standard and schema], in "Request for Comments" status.
 +
 
 +
Development on DFXML to date has involved creating a set of tools that can produce or ingest XML with a common set of tags. It would be nice to have a more aggressive effort, but to date there has not been sufficient funding.
 +
 
 +
Given this state of affairs, our current strategy is to:
  
Today there is no Digital Forensics XML standard and there is no schema. Nevertheless there are a growing number of tools that can either produce or ingest XML data. Given this state of affairs, the goals of this project are:
 
 
* Develop a set of standardized tags and data representations for current XML tools.  
 
* Develop a set of standardized tags and data representations for current XML tools.  
 
* Modify our tools to produce XML similar to the sample XML.
 
* Modify our tools to produce XML similar to the sample XML.
 
* Develop a DTD and schema to allow XML validation.
 
* Develop a DTD and schema to allow XML validation.
  
==XML Forensics Tools==
+
==Tools==
 +
 
 +
===Tools that produce DFXML===
 +
If you want to work with DFXML, you may wish to start with the [https://github.com/simsong/dfxml DFXML package on github].
 +
 
 +
The following tools are known to produce DFXML:
 +
* The [[fiwalk]] C++ program produces DFXML for files from disk images using SleuthKit.
 +
* [[frag_find]], the hash-based carver, uses DFXML to document where files physically reside in the disk image.
 +
* [[photorec]], the popular carver, uses DFXML to document its configuration and where files physically reside in the disk image.
 +
* [[bulk_extractor]] uses DFXML to report the configuration of each run and the provenance of the input files.
 +
* [[afxml]], part of AFFLIB, converts metadata for disk images into DFXML format.
 +
* [[libewf | ewfinfo]], part of libewf, can output metadata for EWF disk images in DFXML format.
 +
* [[md5deep]], [[sha1deep]], [[hashdeep]], and the other programs in the md5deep package produce DFXML hash files when provided with the '''-d''' option. .
 +
 
 +
The following tools are known to consume DFXML:
 +
 
 +
* [https://github.com/simsong/dfxml dfxml.py and libdfxml], currently available for download on github.
 +
 
 +
===Tools that consume DFXML===
 +
* [[frag_find]], the hash-based carver, will be able to read piecewise hash files in DFXML format.
 +
* iblkfind.py, part of the [[fiwalk]] distribution, will report the file associated with any disk sector.
 +
* identify_filenames.py, part of the [[bulk_extractor]] distribution, will take a bulk_extractor feature file and annotate it with the names of the files from which each feature was extracted.
 +
* idifference.py, part of the [[fiwalk]] distribution, will report the difference between two disk images.
 +
* imap.py, part of the [[fiwalk]] distribution, will draw a map of what's on a disk. Only useful for small partitions.
 +
* imicrosoft_redact.py, part of the [[fiwalk]] distribution, will break the Microsoft binaries in a disk image.
 +
* iverify.py, part of the [[fiwalk]] distribution, will verify that the contents of files in a DFXML file haven't been changed.
 +
* The [https://github.com/anarchivist/dfxml dfxml gem for Ruby], mostly used to process [[fiwalk]] output
 +
* [https://github.com/anarchivist/gumshoe Gumshoe], a Ruby/Solr-based search interface for metadata extracted from disk images
 +
 
 +
===Tools that transform DFXML===
 +
* sanitize_xml.py, part of the fiwalk distribution, will remove personally-identifiable information in filenames and directory names from a DFXML file.
 +
 
 +
 
 +
===DFXML Toolkit===
 +
The following toolkits are useful for building new tools that read and write DFXML:
 +
* The dfxml.py Python module implements objects for reading and writing DFXML.
 +
* The xml.cpp and xml.h files that are included in the bulk_extractor and md5deep (version 4) source code are a good C++ implementation for DFXML generation.
 +
* The xml.c and xml.h files that are included in the photorec (new version) source code are a good C implementation for DFXML generation.
 +
 
 +
This toolkit is now available as a git project on github at https://github.com/simsong/dfxml
 +
 
 +
===XML Forensics Tools and Toolkits===
 +
* We are creating a DFXML strategy for distributing hash sets.
  
==XML Forensics Toolkits==
+
==DFXML Bibliography==
 +
===Papers===
 +
#Garfinkel, S. [http://simson.net/clips/academic/2012.DI.dfxml.pdf Digital Forensics XML and the DFXML toolset], Digital Investigation, 2012.
 +
#Garfinkel, Simson., [http://simson.net/clips/academic/2009.SADFE.xml_forensics.pdf Automating Disk Forensic Processing with SleuthKit, XML and Python], Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009), Oakland, California. (Acceptance rate: 32%, 7/22)
 +
===Presentations===
 +
# [http://simson.net/ref/2011/2011-12-07%20DFXML.pdf Digital Forensic Tool Integration], Simson Garfinkel, December 7, 2011
  
 
==See Also==
 
==See Also==
Line 14: Line 66:
 
* [http://www.cgsecurity.org/wiki/Data_Carving_Log XML Log Sample for photorec]
 
* [http://www.cgsecurity.org/wiki/Data_Carving_Log XML Log Sample for photorec]
 
* [http://mark0.net/soft-tridscan-e.html TrIDScan], which has an XML language to describe file types.
 
* [http://mark0.net/soft-tridscan-e.html TrIDScan], which has an XML language to describe file types.
 +
* [https://github.com/simsong/dfxml DFXML toolkit on Github]
 
[[Category:Top-Level]]
 
[[Category:Top-Level]]

Revision as of 15:26, 17 September 2013

Digital Forensics XML (DFXML) is the effort to create an XML schema to allow for easy interoperability between different forensic tools.

Currently there is a draft Digital Forensics XML standard and schema, in "Request for Comments" status.

Development on DFXML to date has involved creating a set of tools that can produce or ingest XML with a common set of tags. It would be nice to have a more aggressive effort, but to date there has not been sufficient funding.

Given this state of affairs, our current strategy is to:

  • Develop a set of standardized tags and data representations for current XML tools.
  • Modify our tools to produce XML similar to the sample XML.
  • Develop a DTD and schema to allow XML validation.

Tools

Tools that produce DFXML

If you want to work with DFXML, you may wish to start with the DFXML package on github.

The following tools are known to produce DFXML:

  • The fiwalk C++ program produces DFXML for files from disk images using SleuthKit.
  • frag_find, the hash-based carver, uses DFXML to document where files physically reside in the disk image.
  • photorec, the popular carver, uses DFXML to document its configuration and where files physically reside in the disk image.
  • bulk_extractor uses DFXML to report the configuration of each run and the provenance of the input files.
  • afxml, part of AFFLIB, converts metadata for disk images into DFXML format.
  • ewfinfo, part of libewf, can output metadata for EWF disk images in DFXML format.
  • md5deep, sha1deep, hashdeep, and the other programs in the md5deep package produce DFXML hash files when provided with the -d option. .

The following tools are known to consume DFXML:

Tools that consume DFXML

  • frag_find, the hash-based carver, will be able to read piecewise hash files in DFXML format.
  • iblkfind.py, part of the fiwalk distribution, will report the file associated with any disk sector.
  • identify_filenames.py, part of the bulk_extractor distribution, will take a bulk_extractor feature file and annotate it with the names of the files from which each feature was extracted.
  • idifference.py, part of the fiwalk distribution, will report the difference between two disk images.
  • imap.py, part of the fiwalk distribution, will draw a map of what's on a disk. Only useful for small partitions.
  • imicrosoft_redact.py, part of the fiwalk distribution, will break the Microsoft binaries in a disk image.
  • iverify.py, part of the fiwalk distribution, will verify that the contents of files in a DFXML file haven't been changed.
  • The dfxml gem for Ruby, mostly used to process fiwalk output
  • Gumshoe, a Ruby/Solr-based search interface for metadata extracted from disk images

Tools that transform DFXML

  • sanitize_xml.py, part of the fiwalk distribution, will remove personally-identifiable information in filenames and directory names from a DFXML file.


DFXML Toolkit

The following toolkits are useful for building new tools that read and write DFXML:

  • The dfxml.py Python module implements objects for reading and writing DFXML.
  • The xml.cpp and xml.h files that are included in the bulk_extractor and md5deep (version 4) source code are a good C++ implementation for DFXML generation.
  • The xml.c and xml.h files that are included in the photorec (new version) source code are a good C implementation for DFXML generation.

This toolkit is now available as a git project on github at https://github.com/simsong/dfxml

XML Forensics Tools and Toolkits

  • We are creating a DFXML strategy for distributing hash sets.

DFXML Bibliography

Papers

  1. Garfinkel, S. Digital Forensics XML and the DFXML toolset, Digital Investigation, 2012.
  2. Garfinkel, Simson., Automating Disk Forensic Processing with SleuthKit, XML and Python, Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009), Oakland, California. (Acceptance rate: 32%, 7/22)

Presentations

  1. Digital Forensic Tool Integration, Simson Garfinkel, December 7, 2011

See Also