Difference between pages "Research Topics" and "File Format Identification"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m (AFFEnhancement)
 
m
 
Line 1: Line 1:
; Research Ideas
+
File Format Identification is the process of figuring out the format of a sequence of bytes. Operating systems typically do this by file extension or by embedded MIME information. Forensic applications need to identify file types by content.
  
Interested in doing research in computer forensics? Looking for a master's topic, or just some ideas for a research paper? Here is my list. Please feel free to add your own ideas.  
+
=Tools=
 +
==libmagic==
 +
* Written in C.
 +
* Rules in /usr/share/file/magic and compiled at runtime.
 +
* Powers the Unix “file” command, but you can also call the library directly from a C program.
 +
* http://sourceforge.net/projects/libmagic
  
 +
==DROID==
 +
* Writen in Java
 +
* Developed by National Archives of the United Kingdom.
 +
* http://droid.sourceforge.net
  
=Hard Problems=
+
==TrID==
* Stream Based Disk Forensics. Process the entire disk with one pass, or at most two, to minimize seek time. 
+
* XML config file
* Determine the device that created an image or video without metadata. (fingerprinting digital cameras)
+
* Closed source; free for non-commercial use
* Automatically detect falsified digital evidence.
+
* http://mark0.net/soft-trid-e.html
* Use the location of where data resides on a computer as a way of inferring information about the computer's past.
+
* Detect and diagnose sanitization attempts.
+
* Recover overwritten data.
+
  
=Tool Development=
+
==Stellent/Oracle Outside-In==
==[[AFF]] Enhancement==
+
* Proprietary but free demo.
* Evaluation of the AFF data page size. What is the optimal page size for compressed forensic work?
+
* http://www.oracle.com/technology/products/content-management/oit/oit_all.html
* Replacement of the AFF "BADFLAG" approach for indicating bad data with a bitmap.
+
* Modify aimage so that it can take a partial disk image and a disk and just image what's missing.
+
* Improve the data recovery features of aimage.
+
* Replace AFF's current table-of-contents system with one based on B+ Trees.
+
  
==Decoders and Validators==
+
[[Category:Tools]]
* A JPEG decompresser that supports restarts and checkpointing for use in high-speed carving. It would also be useful it the JPEG decompressor didn't actually decompress --- all it needs to do is to verify the huffman table.
+
  
==Cell Phones==
+
=Bibliography=
Open source tools for:
+
Current research papers on the file format identification problem. Most of these papers concern themselves with identifying file format of a few file sectors, rather than an entire file.
* Imaging the contents of a cell phone memory
+
* Reassembling information in a cell phone memory
+
  
==Flash Memory==
+
; [http://www.dfrws.org/2008/proceedings/p14-calhoun.pdf Predicting the Types of File Fragments], William Calhoun, Drue Coles, DFRWS 2008 [http://www.dfrws.org/2008/proceedings/p14-calhoun_pres.pdf [slides]]
Flash memory devices such as USB keys implement a [http://www.st.com/stonline/products/literature/an/10122.htm wear leveling algorithm] in hardware so that frequently rewritten blocks are actually written to many different physical blocks. Are there any devices that let you access the raw flash cells underneath the wear leveling chip? Can you get statistics out of the device? Can you access pages that have been mapped out (and still have valid data) but haven't been mapped back yet? Can you use this as a technique for accessing deleted information?
+
  
=Corpora Development=
+
 
==Realistic Corpora==
+
 
* Simulated disk imags
+
[[Category:Bibliography]]
* Simulated network traffic
+
==Real Data==
+
* Digital Cameras
+
* Cell phones
+
* USB Memory Sticks ''below'' the logical layer.
+

Revision as of 23:09, 19 October 2008

File Format Identification is the process of figuring out the format of a sequence of bytes. Operating systems typically do this by file extension or by embedded MIME information. Forensic applications need to identify file types by content.

Tools

libmagic

  • Written in C.
  • Rules in /usr/share/file/magic and compiled at runtime.
  • Powers the Unix “file” command, but you can also call the library directly from a C program.
  • http://sourceforge.net/projects/libmagic

DROID

TrID

Stellent/Oracle Outside-In

Bibliography

Current research papers on the file format identification problem. Most of these papers concern themselves with identifying file format of a few file sectors, rather than an entire file.

Predicting the Types of File Fragments, William Calhoun, Drue Coles, DFRWS 2008 [slides]