Difference between pages "Ssdeep" and "Talk:Tools:File Analysis"

From Forensics Wiki
(Difference between pages)
Jump to: navigation, search
(Added information on file format)
 
(Are file hash analysis tools suitable for this page?)
 
Line 1: Line 1:
{{Infobox_Software |
+
Perhaps a few introductory words as to what kind of file analysis is intended would be helpful.
  name = ssdeep |
+
I was looking for a mention of the http://www.fileadvisor.bit9.com service, and could not decide
  maintainer = [[Jesse Kornblum]] |
+
if it was suitable for this page, or if it should go somewhere else. It's a collection of
  os = [[Linux]], [[Windows]], [[Mac OS X]], [[BSD]], [[Solaris]] |
+
file hashes, very useful for deciding if a file is reasonably well known by its file hash.[[User:Athulin|Athulin]] 02:41, 29 October 2007 (PDT)
  genre = {{Hashing}} |
+
  license = {{GPL}} |
+
  website = [http://ssdeep.sourceforge.net/ ssdeep.sf.net] |
+
}}
+
 
+
ssdeep is a program for computing and matching [[Context Triggered Piecewise Hashing]] values. It is based on a spam detector called [http://samba.org/ftp/unpacked/junkcode/spamsum/ spamsum] by [http://en.wikipedia.org/wiki/Andrew_Tridgell Andrews Trigdell].
+
 
+
== File Format ==
+
The program uses an ASCII text file to record fuzzy hashes. The format changed slightly in version 2.6 in Sep 2010. Hashes created by the version 2.6 or later of the program cannot be used in earlier versions <sup>[http://ssdeep.svn.sourceforge.net/viewvc/ssdeep/tags/release-2.6/FILEFORMAT?revision=107&view=markup ref]</sup>. The contains a header followed by one hash per line. The current header is:
+
 
+
<pre>ssdeep,1.1--blocksize:hash:hash,filename</pre>
+
 
+
== Usage Scenarios ==
+
 
+
=== Truncated Files ===
+
 
+
The program can be used to associate two files where one is a truncated version of the other. In this example, the examiner has a file <tt>all-the-kings-men.avi</tt>. She computes a fuzzy hash of his file:
+
<pre>$ ls -lsh
+
-rwxr-xr-x 1 jvalenti users 699M Sep 29 2006 all-the-kings-men.avi
+
 
+
$ ssdeep -b all-the-kings-men.avi > sig.txt
+
 
+
$ cat sig.txt
+
ssdeep,1.0--blocksize:hash:hash,filename
+
12582912:fgQl/nUjQAbaBQvHf8yLr5CHJu3dyh YJ27TuXyphJs3wHC6 rEfAV wDrw6C/AT:fPl8cdAUyLr5CHJu3dyh8uzwHC6 reAS,"all-the-kings-men.avi"</pre>
+
 
+
The examiner then creates a second file that contains the first 29% of the original. This simulates recovering a partial file in some manner.
+
 
+
<pre>$ dd if=all-the-kings-men.avi of=partial.avi bs=1m count=200
+
200 0 records in
+
200 0 records out
+
209715200 bytes transferred in 14.510224 secs (14452926 bytes/sec)
+
 
+
$ ls -lsh partial.avi
+
-rw-r--r-- 1 jvalenti users 200M Oct 6 06:40 partial.avi</pre>
+
 
+
The examiner can then use the matching mode of ssdeep, the <tt>-m</tt> option, to read the known signature generated above and match it against the partial file.
+
 
+
<pre>$ ssdeep -bm sig.txt partial.avi
+
partial.avi matches all-the-kings-men.avi (57)</pre>
+
 
+
The files are associated!
+
 
+
=== Source Code Reuse ===
+
 
+
The source code for ssdeep was originally obtained from another open source project called [[md5deep]]. An examiner with access to both source code directory trees could use ssdeep to find any similarities between the two. In this example we have two folders, <tt>ssdeep-1.1</tt> and <tt>md5deep-1.12</tt>. First we record the fuzzy hashes, with relative filenames (the <tt>-l</tt> switch) to a file:
+
 
+
<pre>C:\> ssdeep -lr md5deep-1.12 > hashes.txt</pre>
+
 
+
Then we compare those saved hashes with the other directory:
+
 
+
<pre>C:\> ssdeep -lrm hashes.txt ssdeep-1.1
+
ssdeep-1.1\cycles.c matches md5deep-1.12\cycles.c (94)
+
ssdeep-1.1\dig.c matches md5deep-1.12\dig.c (35)
+
ssdeep-1.1\helpers.c matches md5deep-1.12\helpers.c (57)</pre>
+
 
+
Those matches indicate source code reuse! A manual examination of the files in question is required to tell exactly what kind of copying occurred, but we've saved the examiner a lot of work.
+
 
+
An advanced examiner can accomplish this matching with just one command line, but it will also include all of the matches internal to each directory.
+
 
+
<pre>C:\> ssdeep -lrd md5deep-1.12 ssdeep-1.1
+
md5deep-1.12\md5.h matches md5deep-1.12\cycles.c (27)
+
md5deep-1.12\sha1.h matches md5deep-1.12\cycles.c (25)
+
md5deep-1.12\sha1.h matches md5deep-1.12\md5.h (58)
+
md5deep-1.12\sha256.h matches md5deep-1.12\cycles.c (25)
+
md5deep-1.12\sha256.h matches md5deep-1.12\md5.h (61)
+
md5deep-1.12\sha256.h matches md5deep-1.12\sha1.h (57)
+
md5deep-1.12\tiger.h matches md5deep-1.12\cycles.c (29)
+
md5deep-1.12\tiger.h matches md5deep-1.12\md5.h (65)
+
md5deep-1.12\tiger.h matches md5deep-1.12\sha1.h (63)
+
md5deep-1.12\tiger.h matches md5deep-1.12\sha256.h (61)
+
ssdeep-1.1\cycles.c matches md5deep-1.12\cycles.c (94)
+
ssdeep-1.1\dig.c matches md5deep-1.12\dig.c (35)
+
ssdeep-1.1\helpers.c matches md5deep-1.12\helpers.c (57)</pre>
+
 
+
If you'd like to see the matches in both directions (i.e. for two files A and B that match, see that A matches B and B matches A), use the <tt>-p</tt> flag instead of <tt>-d</tt>.
+
 
+
== External Links ==
+
 
+
* [http://ssdeep.sourceforge.net/ Official website]
+
 
+
[[Category:Cross-platform]]
+

Revision as of 04:41, 29 October 2007

Perhaps a few introductory words as to what kind of file analysis is intended would be helpful. I was looking for a mention of the http://www.fileadvisor.bit9.com service, and could not decide if it was suitable for this page, or if it should go somewhere else. It's a collection of file hashes, very useful for deciding if a file is reasonably well known by its file hash.Athulin 02:41, 29 October 2007 (PDT)