File Format Identification
File Format Identification is the process of figuring out the format of a sequence of bytes. Operating systems typically do this by file extension or by embedded MIME information. Forensic applications need to identify file types by content.
- Written in C.
- Rules in /usr/share/file/magic and compiled at runtime.
- Powers the Unix “file” command, but you can also call the library directly from a C program.
- Writen in Java
- Developed by National Archives of the United Kingdom.
- XML config file
- Closed source; free for non-commercial use
- Proprietary but free demo.
Current research papers on the file format identification problem. Most of these papers concern themselves with identifying file format of a few file sectors, rather than an entire file.
- Karresand Martin, Shahmehri Nahid. File type identification of data fragments by their binary structure. In
Proceedings of the IEEE workshop on information assurance; 2006b. p. 140–7. [slides]