File Format Identification
File Format Identification is the process of figuring out the format of a sequence of bytes. Operating systems typically do this by file extension or by embedded MIME information. Forensic applications need to identify file types by content.
- Written in C.
- Rules in /usr/share/file/magic and compiled at runtime.
- Powers the Unix “file” command, but you can also call the library directly from a C program.
- Writen in Java
- Developed by National Archives of the United Kingdom.
- XML config file
- Closed source; free for non-commercial use
Forensic Innovations File Investigator TOOLS
- Proprietary, but free trial available.
- Available as consumer applications and OEM API.
- Identifies 3,000+ file types, using multiple methods to maintain high accuracy.
- Extracts metadata for many of the supported file types.
- Proprietary but free demo.
- Provides detection of password protected archives, some files of cryptographic programs, Pinch/Zeus binary reports, etc.
Current research papers on the file format identification problem. Most of these papers concern themselves with identifying file format of a few file sectors, rather than an entire file. Please note that this bibliography is in chronological order!
- Mason McDaniel, Automatic File Type Detection Algorithm, Masters Thesis, James Madison University,2001
- Content Based File Type Detection Algorithms, Mason McDaniel and M. Hossain Heydari, 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9, 2003.
- Fileprints: identifying file types by n-gram analysis, LiWei-Jen, Wang Ke, Stolfo SJ, Herzog B.., IProceeding of the 2005 IEEE workshop on information assurance; 2005 [slides]
- File Type Detection Technology, Douglas J. Hickok, Daine Richard Lesniak, Michael C. Rowe, 2005 Midwest Instruction and Computing Symposium.
- File type identification of data fragments by their binary structure. , Karresand Martin, Shahmehri Nahid. Proceedings of the IEEE workshop on information assurance; 2006. p. 140–7. [slides]
- Sliding Window Measurement for File Type Identification, Gregory A. Hall, Ph.D., Computer Forensics and Intrusion Analysis Group, ManTech Security and Mission Assurance, 2006
- FORSIGS; Forensic Signature Analysis of the Hard Drive for Multimedia File Fingerprints, John Haggerty and Mark Taylor, IFIP TC11 International Information Security Conference, 2006, Sandton, South Africa.
- Oscar -- Using Byte Pairs to Find File Type and Camera Make of Data Fragments, Martin Karresand , Nahid Shahmehri, Annual Workshop on Digital Forensics and Incident Analysis ( 2006 : Pontypridd, Wales, UK ) , s. 85 - 94, London, UK : Springer-Verlag, 2006
- Karresand M., Shahmehri N., Oscar: File Type Identification of Binary Data in Disk Clusters and RAM Pages, Proceedings of IFIP International Information Security Conference: Security and Privacy in Dynamic Environments (SEC2006), Springer, ISBN 0-387-33405-x, pp 413-424, May 22 - 24, Karlstad, Sweden.
- "Identification and Localization of Data Types within Large-Scale File Systems," Robert F. Erbacher and John Mulholland,, Proceedings of the 2nd International Workshop on Systematic Approaches to Digital Forensic Engineering, Seattle, WA, April 2007,
- Using Artificial Neural Networks for Forensic File Type Identification, Ryan M. Harris, Master's Thesis, Purdue University, May 2007
- SÁDI – Statistical Analysis for Data type Identification, Sarah J. Moody and Robert F. Erbacher, 3rd International Workshop on Systematic Approaches to Digital Forensic Engineering, Third International Workshop on Systematic Approaches to Digital Forensic Engineering, 2008]
- Mehdi Chehel Amirani, Mohsen Toorani, and Ali Asghar Beheshti Shirazi, A New Approach to Content-based File Type Detection, Proceedings of the 13th IEEE Symposium on Computers and Communications (ISCC'08), pp.1103-1108, IEEE ComSoc, Marrakech, Morocco, July 2008.
- Roussev, Vassil, and Garfinkel, Simson, File Classification Fragment---The Case for Specialized Approaches, Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009), Oakland, California.
- Irfan Ahmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong, On improving the accuracy and performance of content-based file type identification, Proceedings of the 14th Australasian Conference on Information Security and Privacy (ACISP 2009), pp.44-59, LNCS (Springer), Brisbane, Australia, July 2009