File Format Identification

From ForensicsWiki
Revision as of 03:16, 20 October 2008 by Simsong (Talk | contribs) (Bibliography)

Jump to: navigation, search

File Format Identification is the process of figuring out the format of a sequence of bytes. Operating systems typically do this by file extension or by embedded MIME information. Forensic applications need to identify file types by content.



  • Written in C.
  • Rules in /usr/share/file/magic and compiled at runtime.
  • Powers the Unix “file” command, but you can also call the library directly from a C program.



Stellent/Oracle Outside-In


Current research papers on the file format identification problem. Most of these papers concern themselves with identifying file format of a few file sectors, rather than an entire file.

Predicting the Types of File Fragments, William Calhoun, Drue Coles, DFRWS 2008 [slides]
File type identification of data fragments by their binary structure. , Karresand Martin, Shahmehri Nahid. Proceedings of the IEEE workshop on information assurance; 2006b. p. 140–7. [slides]
Fileprints: identifying file types by n-gram analysis, LiWei-Jen, Wang Ke, Stolfo SJ, Herzog B.., IProceeding of the 2005 IEEEworkshop

on information assurance; 2005 [slides]