Difference between pages "Fiwalk" and "Disk Imaging"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m
 
(Decryption while imaging)
 
Line 1: Line 1:
{{Infobox_Software |
+
{{expand}}
  name = fiwalk |
+
  maintainer = [[Simson Garfinkel]] |
+
  os = {{Linux}}, {{MacOS}}, {{FreeBSD}} |
+
  genre = [[Carving]] |
+
  license = {{Public Domain}} |
+
  website = https://github.com/kfairbanks/sleuthkit
+
}}
+
  
fiwalk is a batch forensics analysis program written in C that uses SleuthKit. The program can output in XML or ARFF formats.
+
Disk imaging is the process of making a bit-by-bit copy of a disk. Imaging (in more general terms) can apply to anything that can be considered as a bit-stream, e.g. a physical or logical volumes, network streams, etc.
  
==Legacy Distribution==
+
The most straight-forward disk imaging method is reading a disk from start to end and writing the data to a [[:Category:Forensics_File_Formats|Forensics image format]].
'''fiwalk''' is a program that processes a disk image using the SleuthKit library and outputs its results in Digital Forensics XML, the Attribute Relationship File Format (ARFF) format used by the Weka Datamining Toolkit, or an easy-to-read textual format.
+
This can be a time consuming process especially for disks with a large capacity.
  
The fiwalk source code comes with fiwalk.py, a Python module that makes it easy to create digital forensics programs. Also included are several demonstration programs that use fiwalk.py:
+
The process of disk imaging is also referred to as disk duplication.
;iblkfind.py
+
:Given a disk block in a disk image, this program tells you which file(s) map that sector.
+
;icarvingtruth.py
+
:Given two or more images of the same disk at different points in time, this program files that are present in the earlier images that can only be recovered from the later images using file carving techniques.
+
;idifference.py
+
:Given two or more images of the same disk at different points in time, this program tells you what changes took place between each one.
+
;iextract.py
+
:Allows the extraction of files that match a particular pattern.
+
;igrep.py
+
:Searches every file in a disk image for a particular string. When found, prints, the file and the offset within the file that the string was found.
+
;ihistogram.py
+
:Prints a histogram of file types found in the disk image.
+
;imap.py
+
:Displays a “map” of where files are present in the disk image.
+
;imicrosoft_redact.py
+
:Modifies a disk image of a bootable Microsoft operating system so that the image can no longer be boot and so that any Microsoft copyrighted file in the \Windows directory cannot be executed. This allows the disk image of a Microsoft operating system to be distributed without implicitly violating Microsoft’s copyright.
+
;iredact.py
+
:An experimental disk redaction program which allows the removal of specific files matching specific criteria.
+
;iverify.py
+
:Given a disk image and a previously created XML file, verifies that each file in the DFXML file is still present in the disk image.
+
;sanitize_xml.py
+
:Given a DFXML file, sanitize file names so that no personally identifiable information is leaked if the DFXML file is distributed.
+
  
 +
== Disk Imaging Solutions ==
 +
See: [[:Category:Disk Imaging|Disk Imaging Solutions]]
  
==XML Example==
+
== Common practice ==
<pre>
+
It common practice to use a [[Write Blockers|Write Blocker]] when imaging a pyhical disk. The write blocker is an additional measure to prevent write access to the disk.
<?xml version='1.0' encoding='ISO-8859-1'?>
+
<fiwalk xmloutputversion='0.2'>
+
  <metadata
+
  xmlns='http://example.org/myapp/'
+
  xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
+
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
+
    <dc:type>Disk Image</dc:type>
+
  </metadata>
+
  <creator>
+
    <program>fiwalk</program>
+
    <version>0.5.7</version>
+
    <os>Darwin</os>
+
    <library name="tsk" version="3.0.1"></library>
+
    <library name="afflib" version="3.5.2"></library>
+
    <command_line>fiwalk -x /dev/disk2</command_line>
+
  </creator>
+
  <source>
+
    <imagefile>/dev/disk2</imagefile>
+
  </source>
+
<!-- fs start: 512 -->
+
  <volume offset='512'>
+
    <Partition_Offset>512</Partition_Offset>
+
    <block_size>512</block_size>
+
    <ftype>2</ftype>
+
    <ftype_str>fat12</ftype_str>
+
    <block_count>5062</block_count>
+
    <first_block>0</first_block>
+
    <last_block>5061</last_block>
+
    <fileobject>
+
      <filename>README.txt</filename>
+
      <id>2</id>
+
      <filesize>43</filesize>
+
      <partition>1</partition>
+
      <alloc>1</alloc>
+
      <used>1</used>
+
      <inode>6</inode>
+
      <type>1</type>
+
      <mode>511</mode>
+
      <nlink>1</nlink>
+
      <uid>0</uid>
+
      <gid>0</gid>
+
      <mtime>1258916904</mtime>
+
      <atime>1258876800</atime>
+
      <crtime>1258916900</crtime>
+
      <byte_runs>
+
      <run file_offset='0' fs_offset='37376' img_offset='37888' len='43'/>
+
      </byte_runs>
+
      <hashdigest type='md5'>2bbe5c3b554b14ff710a0a2e77ce8c4d</hashdigest>
+
      <hashdigest type='sha1'>b3ccdbe2db1c568e817c25bf516e3bf976a1dea6</hashdigest>
+
    </fileobject>
+
  </volume>
+
<!-- end of volume -->
+
<!-- clock: 0 -->
+
  <runstats>
+
    <user_seconds>0</user_seconds>
+
    <system_seconds>0</system_seconds>
+
    <maxrss>1814528</maxrss>
+
    <reclaims>546</reclaims>
+
    <faults>1</faults>
+
    <swaps>0</swaps>
+
    <inputs>56</inputs>
+
    <outputs>0</outputs>
+
    <stop_time>Sun Nov 22 11:08:36 2009</stop_time>
+
  </runstats>
+
</fiwalk>
+
</pre>
+
  
==Availability==
+
Also see: [[DCO and HPA|Device Configuration Overlay (DCO) and Host Protected Area (HPA)]]
fiwalk can be downloaded from http://afflib.org/fiwalk
+
  
==See Also==
+
== Integrity ==
* [[fileobject]]
+
Often when creating a disk image a [http://en.wikipedia.org/wiki/Cryptographic_hash_function cryptographic hash] is calculated of the entire disk. Commonly used cryptographic hashes are MD5, SHA1 and/or SHA256.
* [http://domex.nps.edu/deep/Fiwalk.html fiwalk on the DEEP website]
+
  
[[Category:Digital Forensics XML]]
+
 
 +
By recalculating the integrity hash at a later time, one can determine if the data in the disk image has been changed. This by itself provides no protection against intentional tampering, but can indicate that the data was altered, e.g. due to corruption. The integrity hash does not indicate where int he data the alteration has occurred. Therefore some image tools and/or formats provide for additional integrity checks like:
 +
* A checksum
 +
* Parity data
 +
* [[Piecewise hashing]]
 +
 
 +
== Smart imaging ==
 +
Smart imaging is a combination of techniques to make the imaging process more intelligent.
 +
* Compressed storage
 +
* Deduplication
 +
* Selective imaging
 +
* Decryption while imaging
 +
 
 +
=== Compressed storage ===
 +
 
 +
A common technique to reduce the size of an image file is to compress the data. Where the compression method should be [http://en.wikipedia.org/wiki/Lossless_data_compression lossless].
 +
On modern computers, with multiple cores, the compression can be done in parallel reducing the output without prolonging the imaging process.
 +
Since the write speed of the target disk can be a bottleneck in imaging process, parallel compression can reduce the total time of the imaging process.
 +
[[Guymager]] was one of the first imaging tools to implement the concept of multi-process compression for the [[Encase image file format]]. This technique is now used by various imaging tools including [http://www.tableau.com/index.php?pageid=products&model=TSW-TIM Tableau Imager (TIM)]
 +
 
 +
Other techniques like storing the data sparse, using '''empty-block compression''' or '''pattern fill''', can reduce the total time of the imaging process and the resulting size of new non-encrypted (0-byte filled) disks.
 +
 
 +
=== Deduplication ===
 +
Deduplication is the process of determining and storing data that occurs more than once on-disk, only once in the image.
 +
It is even possible to store the data once for a corpus of images using techniques like hash based imaging.
 +
 
 +
=== Selective imaging ===
 +
Selective imaging is a technique to only make a copy of certain information on a disk like the $MFT on an [[NTFS]] volume with the necessary contextual information.
 +
 
 +
[[EnCase]] Logical Evidence Format (LEF) is an example of a selective image; although only file related contextual information is stored in the format by [[EnCase]].
 +
 
 +
=== Decryption while imaging ===
 +
Encrypted data is worst-case scenario for compression. Because the encryption process should be deterministic, a solution to reduce the size of an encrypted image is to store it non-encrypted and compressed and encrypt it again on-the-fly if required. Although this should be rare since the non-encrypted data is what undergoes analysis.
 +
 
 +
== Also see ==
 +
* [[:Category:Forensics_File_Formats|Forensics File Formats]]
 +
* [[Write Blockers]]
 +
* [[Piecewise hashing]]
 +
* [[Memory Imaging]]
 +
 
 +
== External Links ==
 +
* [http://www.tableau.com/pdf/en/Tableau_Forensic_Disk_Perf.pdf Benchmarking Hard Disk Duplication Performance in Forensic Applications], by [[Robert Botchek]]
 +
 
 +
=== Hash based imaging ===
 +
* [http://www.dfrws.org/2010/proceedings/2010-314.pdf Hash based disk imaging using AFF4], by [[Michael Cohen]], [[Bradley Schatz]]
 +
 
 +
[[Category:Disk Imaging]]

Revision as of 05:29, 28 July 2012

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Disk imaging is the process of making a bit-by-bit copy of a disk. Imaging (in more general terms) can apply to anything that can be considered as a bit-stream, e.g. a physical or logical volumes, network streams, etc.

The most straight-forward disk imaging method is reading a disk from start to end and writing the data to a Forensics image format. This can be a time consuming process especially for disks with a large capacity.

The process of disk imaging is also referred to as disk duplication.

Disk Imaging Solutions

See: Disk Imaging Solutions

Common practice

It common practice to use a Write Blocker when imaging a pyhical disk. The write blocker is an additional measure to prevent write access to the disk.

Also see: Device Configuration Overlay (DCO) and Host Protected Area (HPA)

Integrity

Often when creating a disk image a cryptographic hash is calculated of the entire disk. Commonly used cryptographic hashes are MD5, SHA1 and/or SHA256.


By recalculating the integrity hash at a later time, one can determine if the data in the disk image has been changed. This by itself provides no protection against intentional tampering, but can indicate that the data was altered, e.g. due to corruption. The integrity hash does not indicate where int he data the alteration has occurred. Therefore some image tools and/or formats provide for additional integrity checks like:

Smart imaging

Smart imaging is a combination of techniques to make the imaging process more intelligent.

  • Compressed storage
  • Deduplication
  • Selective imaging
  • Decryption while imaging

Compressed storage

A common technique to reduce the size of an image file is to compress the data. Where the compression method should be lossless. On modern computers, with multiple cores, the compression can be done in parallel reducing the output without prolonging the imaging process. Since the write speed of the target disk can be a bottleneck in imaging process, parallel compression can reduce the total time of the imaging process. Guymager was one of the first imaging tools to implement the concept of multi-process compression for the Encase image file format. This technique is now used by various imaging tools including Tableau Imager (TIM)

Other techniques like storing the data sparse, using empty-block compression or pattern fill, can reduce the total time of the imaging process and the resulting size of new non-encrypted (0-byte filled) disks.

Deduplication

Deduplication is the process of determining and storing data that occurs more than once on-disk, only once in the image. It is even possible to store the data once for a corpus of images using techniques like hash based imaging.

Selective imaging

Selective imaging is a technique to only make a copy of certain information on a disk like the $MFT on an NTFS volume with the necessary contextual information.

EnCase Logical Evidence Format (LEF) is an example of a selective image; although only file related contextual information is stored in the format by EnCase.

Decryption while imaging

Encrypted data is worst-case scenario for compression. Because the encryption process should be deterministic, a solution to reduce the size of an encrypted image is to store it non-encrypted and compressed and encrypt it again on-the-fly if required. Although this should be rare since the non-encrypted data is what undergoes analysis.

Also see

External Links

Hash based imaging