Difference between pages "File Carving" and "Hashing"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(See also)
 
(added Hashkeeper reference, Sun fingerprints, commented Online NSRL downtime)
 
Line 1: Line 1:
'''File Carving,''' or sometimes simply '''Carving,''' is the practice of searching an input for files or other kinds of objects based on content, rather than on metadata. File carving is a powerful tool for recovering files and fragments of files when directory entries are corrupt or missing, as may be the case with old files that have been deleted or when performing an analysis on damaged media. Memory carving is a useful tool for analyzing physical and virtual memory dumps when the memory structures are unknown or have been overwritten.
+
'''Hashing''' is a method for reducing large inputs to a smaller fixed size output. When doing forensics, typically cryptographic hashing algorithms like [[MD5]] and [[SHA-1]] are used. These functions have a few properties useful to forensics. Other types of hashing, such as [[Context Triggered Piecewise Hashing]] can also be used.
  
 +
== Tools ==
 +
There are literally hundreds of hashing programs out there, but a few related to forensics are:
  
Most file carvers operate by looking for file headers and/or footers, and then "carving out" the blocks between these two boundaries. [[Semantic Carving]] performs carving based on an analysis of the contents of the proposed files.  
+
* [[md5sum]] - Part of the [[GNU]] coreutils suite, this program is standard on many computers.
 +
* [[md5deep]] - Computes hashes, recursively if desired, and can compare the results to known values.
 +
* [[ssdeep]] - Computes and matches [[Context Triggered Piecewise Hashes]].
  
File carving should be done on a [[disk image]], rather than on the original disk.
+
==Hash Databases==
 +
; [[National Software Reference Library ]]  
 +
: The largest hash database.
 +
; [[Hashkeeper]]
 +
: National Drug Intelligence Center
 +
; http://sunsolve.sun.com/fileFingerprints.do
 +
: Solaris Fingerprint Database lookup for files distributed by Sun Microsystems
  
File carving tools are listed on the [[Tools:Data_Recovery]] wiki page.
+
==Online NSRL Lookup==
 +
; http://ionrift.ath.cx/nsrl/
 +
: Allows searching of NSRL 2.17 by MD5 or SHA1. Reportedly the dataset contains 43,103,492 files.
 +
: (Infrequently available, and likely only when the site owner (Jason Spashett) needs to use it himself.)
  
Many carving programs have an option to only look at or near sector boundaries where headers are found. However, searching the entire input can find files that have been embedded into other files, such as [[JPEG]]s being embedded into [[Microsoft]] [[DOC|Word documents]]. This may be considered an advantage or a disadvantage, depending on the circumstances.
+
==MD5 Reverse Hash Services==
 +
There are several online services that allow you to enter a hash code and find out what the preimage might have been. One way to find these services is to google for 'd41d8cd98f00b204e9800998ecf8427e' (the MD5 of the null string).
  
The majority of file carving programs will only recover files that are contiguous on the media (in other words files that are not fragmented).
+
Here are some services that we have been able to find:
  
== Fragmented File Recovery ==
+
; http://nz.md5.crysm.net/
[[Simson Garfinkel]] estimated that upto 58% of outlook, 17% of jpegs and 16% of MS-Word files are fragmented and, therefore, appear corrupted or missing to a user using traditional data carving. The first set of file carving programs that can handle fragmented files automatically have finally arrived.
+
: MD5 reverse lookup, operated by  Stephen D Cope. As of December 2007 this database had 28 million MD5 hashes. The author states that the database is divided into 256 MySQL tables to make the problem more tractable.  The database claims to include every two, three, and four digit combination, all dictionary words, and a pile of user-submitted data. But the author also states that they are attempting to calculate and index all possible MD5 indexes. Of course, this is an impossibility.
[[User:PashaPal|A. Pal]], [[User:NasirMemon|N. Memon]] and K. Shanmugasundaram have introduced a technique called [[File_Carving:SmartCarving|SmartCarving]] that can recover fragmented files.
+
  
== File Carving Taxonomy==
+
; http://us.md5.crysm.net/
[[Simson Garfinkel]] and [[Joachim Metz]] have proposed the following file carving taxonomy:
+
: Similar to the NZ server, but with only 16 million MD5 hashes.
  
;Carving
+
; http://md5.benramsey.com
:General term for extracting data (files) out of undifferentiated blocks (raw data), like "carving" a sculpture out of soap stone.  
+
: A nice forward and reverse demonstration system, with an XML and AJAX interface.
  
;Block Based Carving
+
; http://www.hashcrack.com/
:Any carving method (algorithm) that analyzes the input on block-by-block basis to determine if a block is part of a possible output file. This method assumes that each block can only be part of a single file (or embedded file).
+
: Reverse hash lookup of MD5, SHA1, MySQL, NTLM, and Lanman hashes. Claims 75 million hashes of 13.2 million unique words.
  
;Characteristic Based Carving
+
; http://gdataonline.com/seekhash.php
:Any carving method (algorithm) that analyzes the input on characteristic basis (for example, entropy) to determine if the input is part of a possible output file.
+
: MD5 reverse lookup with approximately 1 million entries.
  
;Header/Footer Carving
+
; http://hash.insidepro.com/
:A method for carving files out of raw data using a distinct header (start of file marker) and footer (end of file marker).
+
: Hash database from InsidePro (MD5, NTLM).
  
;Header/Maximum (file) size Carving
+
; http://www.xmd5.cn/index_en.htm
:A method for carving files out of raw data using a distinct header (start of file marker) and a maximum (file) size. This approach works because many file formats (e.g. JPEG, MP3) do not care if additional junk is appended to the end of a valid file.
+
; http://www.xmd5.org/index_en.htm
 +
: This site is another simple MD5 reverse lookup. It claims a database with "billions" of entries. Mostly for password cracking. (Who uses straight MD5s for passwords?)
  
;Header/Embedded Length Carving
+
Others:
:A method for carving files out of raw data using a distinct header and a file length (size) which is embedded in the file format
+
; http://www.md5this.com/
 
+
; http://www.csthis.com/md5/
;File structure based Carving
+
; http://md5.rednoize.com/
:A method for carving files out of raw data using a certain level of knowledge of the internal structure of file types. Garfinkel called this approach "Semantic Carving" in his DFRWS2006 carving challenge submission, while Metz and Mora called the approach "Deep Carving."
+
 
+
;Semantic Carving
+
:A method for carving files based on a linguistic analysis of the file's content. For example, a semantic carver might conclude that six blocks of french in the middle of a long HTML file written in English is a fragment left from a previous allocated file, and not from the English-language HTML file.
+
 
+
;Carving with Validation
+
:A method for carving files out of raw data where the carved files are validated using a file type specific validator.
+
 
+
;Fragment Recovery Carving
+
:A carving method in which two or more fragments are reassembled to form the original file or object. Garfinkel previously called this approach "Split Carving."
+
 
+
== File Carving challenges and test images ==
+
 
+
[http://www.dfrws.org/2006/challenge/ File Carving Challenge] - [[Digital Forensic Research Workshop|DFRWS]] 2006
+
 
+
[http://www.dfrws.org/2007/challenge/ File Carving Challenge] - [[Digital Forensic Research Workshop|DFRWS]] 2007
+
 
+
[http://dftt.sourceforge.net/test6/index.html FAT Undelete Test #1] - Digital Forensics Tool Testing Image (dftt #6)
+
 
+
[http://dftt.sourceforge.net/test7/index.html NTFS Undelete (and leap year) Test #1] - Digital Forensics Tool Testing Image (dftt #7)
+
 
+
[http://dftt.sourceforge.net/test11/index.html Basic Data Carving Test - fat32], Nick Mikus - Digital Forensics Tool Testing Image (dftt #11)
+
 
+
[http://dftt.sourceforge.net/test12/index.html Basic Data Carving Test - ext2],  Nick Mikus - Digital Forensics Tool Testing Image (dftt #12)
+
 
+
== See also ==
+
* [[Tools:Data_Recovery#Carving | File Carving Tools]]
+
* [[File Carving Bibliography]]
+
* [[Carver 2.0 Planning Page]]
+
* [[File Carving:SmartCarving|SmartCarving]]
+
 
+
=Memory Carving=
+

Revision as of 00:50, 4 September 2008

Hashing is a method for reducing large inputs to a smaller fixed size output. When doing forensics, typically cryptographic hashing algorithms like MD5 and SHA-1 are used. These functions have a few properties useful to forensics. Other types of hashing, such as Context Triggered Piecewise Hashing can also be used.

Tools

There are literally hundreds of hashing programs out there, but a few related to forensics are:

Hash Databases

National Software Reference Library
The largest hash database.
Hashkeeper
National Drug Intelligence Center
http://sunsolve.sun.com/fileFingerprints.do
Solaris Fingerprint Database lookup for files distributed by Sun Microsystems

Online NSRL Lookup

http://ionrift.ath.cx/nsrl/
Allows searching of NSRL 2.17 by MD5 or SHA1. Reportedly the dataset contains 43,103,492 files.
(Infrequently available, and likely only when the site owner (Jason Spashett) needs to use it himself.)

MD5 Reverse Hash Services

There are several online services that allow you to enter a hash code and find out what the preimage might have been. One way to find these services is to google for 'd41d8cd98f00b204e9800998ecf8427e' (the MD5 of the null string).

Here are some services that we have been able to find:

http://nz.md5.crysm.net/
MD5 reverse lookup, operated by Stephen D Cope. As of December 2007 this database had 28 million MD5 hashes. The author states that the database is divided into 256 MySQL tables to make the problem more tractable. The database claims to include every two, three, and four digit combination, all dictionary words, and a pile of user-submitted data. But the author also states that they are attempting to calculate and index all possible MD5 indexes. Of course, this is an impossibility.
http://us.md5.crysm.net/
Similar to the NZ server, but with only 16 million MD5 hashes.
http://md5.benramsey.com
A nice forward and reverse demonstration system, with an XML and AJAX interface.
http://www.hashcrack.com/
Reverse hash lookup of MD5, SHA1, MySQL, NTLM, and Lanman hashes. Claims 75 million hashes of 13.2 million unique words.
http://gdataonline.com/seekhash.php
MD5 reverse lookup with approximately 1 million entries.
http://hash.insidepro.com/
Hash database from InsidePro (MD5, NTLM).
http://www.xmd5.cn/index_en.htm
http://www.xmd5.org/index_en.htm
This site is another simple MD5 reverse lookup. It claims a database with "billions" of entries. Mostly for password cracking. (Who uses straight MD5s for passwords?)

Others:

http://www.md5this.com/
http://www.csthis.com/md5/
http://md5.rednoize.com/