Category:Digital Forensics XML

From ForensicsWiki
Revision as of 12:42, 5 June 2011 by Simsong (Talk | contribs)

Jump to: navigation, search

Digital Forensics XML (DFXML) is the effort to create an XML schema to allow for easy interoperability between different forensic tools.

Today there is no Digital Forensics XML standard and there is no fixed schema. Instead, we are slowly creating a set of tools that can produce or ingest XML with a common set of tags. It would be nice to have a more aggressive effort, but to date there has not been sufficient funding.

Given this state of affairs, our current strategy is to:

  • Develop a set of standardized tags and data representations for current XML tools.
  • Modify our tools to produce XML similar to the sample XML.
  • Develop a DTD and schema to allow XML validation.

Tools that produce DFXML

The following tools are known to produce DFXML:

  • The fiwalk C++ program produces DFXML for files from disk images using SleuthKit.
  • The following carvers use (or will soon use) DFXML for their report files to indicate the sectors from which carved objects are found:
    • frag_find (hash-based carver)
    • photorec
    • scalpel
  • bulk_extractor uses DFXML to report the configuration of each run and the provenance of the input files.
  • afxml, part of AFFLIB, converts metadata for disk images into DFXML format.
  • ewfinfo, part of libewf, can output metadata for EWF disk images in DFXML format.
  • md5deep, sha1deep, hashdeep, and the other programs in the md5deep package will produce DFXML hash files in the new Version 4 of the program (currently under development).

Tools that consume DFXML

  • frag_find, the hash-based carver, will be able to read piecewise hash files in DFXML format.
  • iblkfind.py, part of the fiwalk distribution, will report the file associated with any disk sector.
  • identify_filenames.py, part of the bulk_extractor distribution, will take a bulk_extractor feature file and annotate it with the names of the files from which each feature was extracted.
  • idifference.py, part of the fiwalk distribution, will report the difference between two disk images.
  • imap.py, part of the fiwalk distribution, will draw a map of what's on a disk. Only useful for small partitions.
  • imicrosoft_redact.py, part of the fiwalk distribution, will break the Microsoft binaries in a disk image.
  • iverify.py, part of the fiwalk distribution, will verify that the contents of files in a DFXML file haven't been changed.

Tools that transform DFXML

  • sanitize_xml.py, part of the fiwalk distribution, will remove personally-identifiable information in filenames and directory names from a DFXML file.


DFXML Toolkits

The following toolkits are useful for building new tools that read and write DFXML:

  • The dfxml.py Python module implements objects for reading and writing DFXML.
  • The xml.cpp and xml.h files that are included in the bulk_extractor and md5deep (version 4) source code are a good C++ implementation for DFXML generation.
  • The xml.c and xml.h files that are included in the photorec (new version) source code are a good C implementation for DFXML generation.

XML Forensics Tools and Toolkits

  • We are creating a DFXML strategy for distributing hash sets.

See Also