Difference between revisions of "File Format Identification"

From ForensicsWiki
Jump to: navigation, search
m (formatting Bibliographies)
m (Forensic Assistant)
(4 intermediate revisions by 2 users not shown)
Line 32: Line 32:
 
* Proprietary.
 
* Proprietary.
 
* Provides detection of password protected archives, some files of cryptographic programs, Pinch/Zeus binary reports, etc.
 
* Provides detection of password protected archives, some files of cryptographic programs, Pinch/Zeus binary reports, etc.
 
+
* http://nhtcu.ru/0xFA_eng.html
 
[[Category:Tools]]
 
[[Category:Tools]]
 +
 +
=Data Sets=
 +
If you are working in the field of file format identification, please consider reporting the results of your algorithm with one of these publicly available data sets:
 +
* NPS govdocs1m - a corpus of 1 million files that can be redistributed without concern of copyright or PII. Download from http://domex.nps.edu/corp/files/govdocs1/
 +
* The NPS Disk Corpus - a corpus of realistic disk images that contain no PII. Information is at: http://digitalcorpora.org/?s=nps
  
 
=Bibliography=
 
=Bibliography=
Line 49: Line 54:
 
; 2005
 
; 2005
  
* Fileprints: identifying file types by n-gram analysis, LiWei-Jen, Wang Ke, Stolfo SJ, Herzog B..,  IProceeding of the 2005 IEEE workshop on information assurance, 2005. ([http://www.itoc.usma.edu/workshop/2005/Papers/Follow%20ups/FilePrintPresentation-final.pdf S[slides]] [http://www1.cs.columbia.edu/ids/publications/FilePrintPaper-revised.pdf PDF])
+
* Fileprints: identifying file types by n-gram analysis, LiWei-Jen, Wang Ke, Stolfo SJ, Herzog B..,  IProceeding of the 2005 IEEE workshop on information assurance, 2005. ([http://www.itoc.usma.edu/workshop/2005/Papers/Follow%20ups/FilePrintPresentation-final.pdf Presentation Slides])  ([http://www1.cs.columbia.edu/ids/publications/FilePrintPaper-revised.pdf PDF])
  
 
* Douglas J. Hickok, Daine Richard Lesniak, Michael C. Rowe, File Type Detection Technology,  2005 Midwest Instruction and Computing Symposium.([http://www.micsymposium.org/mics_2005/papers/paper7.pdf PDF])
 
* Douglas J. Hickok, Daine Richard Lesniak, Michael C. Rowe, File Type Detection Technology,  2005 Midwest Instruction and Computing Symposium.([http://www.micsymposium.org/mics_2005/papers/paper7.pdf PDF])
Line 55: Line 60:
 
; 2006
 
; 2006
  
* Karresand Martin, Shahmehri Nahid [http://ieeexplore.ieee.org/iel5/10992/34632/01652088.pdf  File type identification of data fragments by their binary structure. ], Proceedings of the IEEE workshop on information assurance, pp.140–147, 2006.([http://www.itoc.usma.edu/workshop/2006/Program/Presentations/IAW2006-07-3.pdf [slides]])
+
* Karresand Martin, Shahmehri Nahid [http://ieeexplore.ieee.org/iel5/10992/34632/01652088.pdf  File type identification of data fragments by their binary structure. ], Proceedings of the IEEE workshop on information assurance, pp.140–147, 2006.([http://www.itoc.usma.edu/workshop/2006/Program/Presentations/IAW2006-07-3.pdf Presentation Slides])
  
 
* Gregory A. Hall, Sliding Window Measurement for File Type Identification, Computer Forensics and Intrusion Analysis Group, ManTech Security and Mission Assurance, 2006. ([http://www.mantechcfia.com/SlidingWindowMeasurementforFileTypeIdentification.pdf PDF])
 
* Gregory A. Hall, Sliding Window Measurement for File Type Identification, Computer Forensics and Intrusion Analysis Group, ManTech Security and Mission Assurance, 2006. ([http://www.mantechcfia.com/SlidingWindowMeasurementforFileTypeIdentification.pdf PDF])
Line 71: Line 76:
 
* Ryan M. Harris, "Using Artificial Neural Networks for Forensic File Type Identification," Master's Thesis, Purdue University, May 2007. ([https://www.cerias.purdue.edu/tools_and_resources/bibtex_archive/archive/2007-19.pdf PDF])
 
* Ryan M. Harris, "Using Artificial Neural Networks for Forensic File Type Identification," Master's Thesis, Purdue University, May 2007. ([https://www.cerias.purdue.edu/tools_and_resources/bibtex_archive/archive/2007-19.pdf PDF])
  
* Predicting the Types of File Fragments, William Calhoun, Drue Coles, DFRWS 2008. ([http://www.dfrws.org/2008/proceedings/p14-calhoun_pres.pdf [slides]] [http://www.dfrws.org/2008/proceedings/p14-calhoun.pdf PDF])
+
* Predicting the Types of File Fragments, William Calhoun, Drue Coles, DFRWS 2008. ([http://www.dfrws.org/2008/proceedings/p14-calhoun_pres.pdf Presentation Slides])  ([http://www.dfrws.org/2008/proceedings/p14-calhoun.pdf PDF])
  
 
* Sarah J. Moody and Robert F. Erbacher, [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04545366 SÁDI – Statistical Analysis for Data type Identification], 3rd International Workshop on Systematic Approaches to Digital Forensic Engineering, 2008.
 
* Sarah J. Moody and Robert F. Erbacher, [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04545366 SÁDI – Statistical Analysis for Data type Identification], 3rd International Workshop on Systematic Approaches to Digital Forensic Engineering, 2008.
Line 77: Line 82:
 
; 2008
 
; 2008
  
* Mehdi Chehel Amirani, Mohsen Toorani, and Ali Asghar Beheshti Shirazi, [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4625611 A New Approach to Content-based File Type Detection], Proceedings of the 13th IEEE Symposium on Computers and Communications (ISCC'08), pp.1103-1108, IEEE ComSoc, Marrakech, Morocco, July 2008.([http://webpages.iust.ac.ir/mtoorani/FTD.pdf [slides]] [http://webpages.iust.ac.ir/mtoorani/C2.pdf PDF])
+
* Mehdi Chehel Amirani, Mohsen Toorani, and Ali Asghar Beheshti Shirazi, [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4625611 A New Approach to Content-based File Type Detection], Proceedings of the 13th IEEE Symposium on Computers and Communications (ISCC'08), pp.1103-1108, July 2008. ([http://arxiv.org/ftp/arxiv/papers/1002/1002.3174.pdf PDF])
  
 
; 2009
 
; 2009

Revision as of 19:23, 18 July 2010

File Format Identification is the process of figuring out the format of a sequence of bytes. Operating systems typically do this by file extension or by embedded MIME information. Forensic applications need to identify file types by content.

Tools

libmagic

  • Written in C.
  • Rules in /usr/share/file/magic and compiled at runtime.
  • Powers the Unix “file” command, but you can also call the library directly from a C program.
  • http://sourceforge.net/projects/libmagic

DROID

TrID

Forensic Innovations File Investigator TOOLS

  • Proprietary, but free trial available.
  • Available as consumer applications and OEM API.
  • Identifies 3,000+ file types, using multiple methods to maintain high accuracy.
  • Extracts metadata for many of the supported file types.
  • http://www.forensicinnovations.com/fitools.html

Stellent/Oracle Outside-In

Forensic Assistant

  • Proprietary.
  • Provides detection of password protected archives, some files of cryptographic programs, Pinch/Zeus binary reports, etc.
  • http://nhtcu.ru/0xFA_eng.html

Data Sets

If you are working in the field of file format identification, please consider reporting the results of your algorithm with one of these publicly available data sets:

Bibliography

Current research papers on the file format identification problem. Most of these papers concern themselves with identifying file format of a few file sectors, rather than an entire file. Please note that this bibliography is in chronological order!


2001
2003
2005
  • Fileprints: identifying file types by n-gram analysis, LiWei-Jen, Wang Ke, Stolfo SJ, Herzog B.., IProceeding of the 2005 IEEE workshop on information assurance, 2005. (Presentation Slides) (PDF)
  • Douglas J. Hickok, Daine Richard Lesniak, Michael C. Rowe, File Type Detection Technology, 2005 Midwest Instruction and Computing Symposium.(PDF)
2006
  • Gregory A. Hall, Sliding Window Measurement for File Type Identification, Computer Forensics and Intrusion Analysis Group, ManTech Security and Mission Assurance, 2006. (PDF)
  • FORSIGS; Forensic Signature Analysis of the Hard Drive for Multimedia File Fingerprints, John Haggerty and Mark Taylor, IFIP TC11 International Information Security Conference, 2006, Sandton, South Africa.
  • Martin Karresand , Nahid Shahmehri, "Oscar -- Using Byte Pairs to Find File Type and Camera Make of Data Fragments," Annual Workshop on Digital Forensics and Incident Analysis, Pontypridd, Wales, UK, pp.85-94, Springer-Verlag, 2006.
2007
  • Robert F. Erbacher and John Mulholland, "Identification and Localization of Data Types within Large-Scale File Systems," Proceedings of the 2nd International Workshop on Systematic Approaches to Digital Forensic Engineering, Seattle, WA, April 2007.
  • Ryan M. Harris, "Using Artificial Neural Networks for Forensic File Type Identification," Master's Thesis, Purdue University, May 2007. (PDF)
  • Predicting the Types of File Fragments, William Calhoun, Drue Coles, DFRWS 2008. (Presentation Slides) (PDF)
2008
2009
  • Roussev, Vassil, and Garfinkel, Simson, "File Classification Fragment-The Case for Specialized Approaches," Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009), Oakland, California. (PDF)
2010
  • Irfan Ahmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong, Fast File-type Identification, Proceedings of the 25th ACM Symposium on Applied Computing (ACM SAC 2010), ACM, Sierre, Switzerland, March 2010.