File Format Identification

From ForensicsWiki
Revision as of 15:48, 30 October 2009 by Ahm irf (Talk | contribs)

Jump to: navigation, search

File Format Identification is the process of figuring out the format of a sequence of bytes. Operating systems typically do this by file extension or by embedded MIME information. Forensic applications need to identify file types by content.



  • Written in C.
  • Rules in /usr/share/file/magic and compiled at runtime.
  • Powers the Unix “file” command, but you can also call the library directly from a C program.



Forensic Innovations File Investigator TOOLS

  • Proprietary, but free trial available.
  • Available as consumer applications and OEM API.
  • Identifies 3,000+ file types, using multiple methods to maintain high accuracy.
  • Extracts metadata for many of the supported file types.

Stellent/Oracle Outside-In

Forensic Assistant

  • Proprietary.
  • Provides detection of password protected archives, some files of cryptographic programs, Pinch/Zeus binary reports, etc.



Current research papers on the file format identification problem. Most of these papers concern themselves with identifying file format of a few file sectors, rather than an entire file. Please note that this bibliography is in chronological order!

  • FORSIGS; Forensic Signature Analysis of the Hard Drive for Multimedia File Fingerprints, John Haggerty and Mark Taylor, IFIP TC11 International Information Security Conference, 2006, Sandton, South Africa.
  • Oscar -- Using Byte Pairs to Find File Type and Camera Make of Data Fragments, Martin Karresand , Nahid Shahmehri, Annual Workshop on Digital Forensics and Incident Analysis ( 2006 : Pontypridd, Wales, UK ) , s. 85 - 94, London, UK : Springer-Verlag, 2006
  • "Identification and Localization of Data Types within Large-Scale File Systems," Robert F. Erbacher and John Mulholland,, Proceedings of the 2nd International Workshop on Systematic Approaches to Digital Forensic Engineering, Seattle, WA, April 2007,
  • Mehdi Chehel Amirani, Mohsen Toorani, and Ali Asghar Beheshti Shirazi, A New Approach to Content-based File Type Detection, Proceedings of the 13th IEEE Symposium on Computers and Communications (ISCC'08), pp.1103-1108, IEEE ComSoc, Marrakech, Morocco, July 2008.
  • Irfan Ahmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong, Fast File-type Identification, Proceedings of the 25th ACM Symposium on Applied Computing (ACM SAC 2010), ACM, Sierre, Switzerland, March 2010