Difference between pages "File Format Identification" and "Forensic Live CD issues"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m (Forensic Assistant)
 
m
 
Line 1: Line 1:
File Format Identification is the process of figuring out the format of a sequence of bytes. Operating systems typically do this by file extension or by embedded MIME information. Forensic applications need to identify file types by content.
+
== The problem ==
  
=Tools=
+
[[Tools#Forensics_Live_CDs | Forensic Linux Live CD distributions]] are widely used during computer forensic investigations. Currently, many vendors of such Live CD distributions spread false claims that their distributions "do not touch anything", "write protect everything" and so on. Community-developed distributions are not exception here, unfortunately. Finally, it turns out that many forensic Linux Live CD distributions are not tested properly and there are no suitable test cases developed.
==libmagic==
+
* Written in C.  
+
* Rules in /usr/share/file/magic and compiled at runtime.
+
* Powers the Unix “file” command, but you can also call the library directly from a C program.
+
* http://sourceforge.net/projects/libmagic
+
  
==DROID==
+
== Another side of the problem ==
* Writen in Java
+
* Developed by National Archives of the United Kingdom.
+
* http://droid.sourceforge.net
+
  
==TrID==
+
Another side of the problem of insufficient testing of forensic Live CD distributions is that many users do not know what happens "under the hood" of such distributions and cannot adequately test them.
* XML config file
+
* Closed source; free for non-commercial use
+
* http://mark0.net/soft-trid-e.html
+
  
==Forensic Innovations File Investigator TOOLS==
+
=== Example ===
* Proprietary, but free trial available.
+
* Available as consumer applications and OEM API.
+
* Identifies 3,000+ file types, using multiple methods to maintain high accuracy.
+
* Extracts metadata for many of the supported file types.
+
* http://www.forensicinnovations.com/fitools.html
+
  
==Stellent/Oracle Outside-In==
+
For example, [http://forensiccop.blogspot.com/2009/10/forensic-cop-journal-13-2009.html ''Forensic Cop Journal'' (Volume 1(3), Oct 2009)] describes a test case when an Ext3 file system was mounted using "-o ro" mount flag as a way to write protect the data. The article says that all tests were successful (i.e. no data modification was found after unmounting the file system), but it is known that damaged (i.e not properly unmounted) Ext3 file systems cannot be write protected using only "-o ro" mount flags (write access will be enabled during file system recovery).
* Proprietary but free demo.
+
* http://www.oracle.com/technology/products/content-management/oit/oit_all.html
+
  
==[[Forensic Assistant]]==
+
And the question is: will many users test damaged Ext3 file system (together with testing the clean one) when validating their favourite forensic Live CD distribution? My answer is "no", because many users are unaware of such traits.
* Proprietary.
+
* Provides detection of password protected archives, some files of cryptographic programs, Pinch/Zeus binary reports, etc.
+
* http://nhtcu.ru/0xFA_eng.html
+
[[Category:Tools]]
+
  
=Data Sets=
+
== Problems ==
If you are working in the field of file format identification, please consider reporting the results of your algorithm with one of these publicly available data sets:
+
* NPS govdocs1m - a corpus of 1 million files that can be redistributed without concern of copyright or PII. Download from http://domex.nps.edu/corp/files/govdocs1/
+
* The NPS Disk Corpus - a corpus of realistic disk images that contain no PII. Information is at: http://digitalcorpora.org/?s=nps
+
  
=Bibliography=
+
Here is a list of common problems of forensic Linux Live CD distributions that can be used by developers and users for testing purposes. Each problem is followed by an up to date list of distributions affected.
Current research papers on the file format identification problem. Most of these papers concern themselves with identifying file format of a few file sectors, rather than an entire file.  '''Please note that this bibliography is in chronological order!'''
+
  
 +
=== Journaling file systems updates ===
  
;2001
+
When mounting (and unmounting) several journaling file systems with only "-o ro" mount flag a different number of data writes may occur. Here is a list of such file systems:
  
* Mason McDaniel, [[Media:Mcdaniel01.pdf|Automatic File Type Detection Algorithm]], Masters Thesis, James Madison University,2001
+
{| class="wikitable" border="1"
 +
|-
 +
!  File system
 +
!  When data writes happen
 +
!  Notes
 +
|-
 +
|  Ext3
 +
|  File system requires journal recovery
 +
|  To disable recovery: use "noload" flag, or use "ro,loop" flags, or use "ext2" file system type
 +
|-
 +
|  Ext4
 +
File system requires journal recovery
 +
|  To disable recovery: use "noload" flag, or use "ro,loop" flags, or use "ext2" file system type
 +
|-
 +
|  ReiserFS
 +
|  File system has unfinished transactions
 +
|  "nolog" flag does not work (see ''man mount''). To disable journal updates: use "ro,loop" flags
 +
|-
 +
|  XFS
 +
|  Always
 +
|  "norecovery" flag does not help. To disable data writes: use "ro,loop" flags. The bug was fixed in recent 2.6 kernels.
 +
|}
  
; 2003
+
Incorrect mount flags can be used to mount file systems on evidentiary media during the boot process or during the file system preview process. As described above, this may result in data writes to evidentiary media. For example, several Ubuntu-based forensic Live CD distributions mount Ext3/4 file systems on fixed media (e.g. hard drives) during execution of [http://en.wikipedia.org/wiki/Initrd ''initrd''] scripts (these scripts mount every supported file system type on every supported media type using only "-o ro" flag in order to find a root file system image).
  
* [http://www2.computer.org/portal/web/csdl/abs/proceedings/hicss/2003/1874/09/187490332a.pdf Content Based File Type Detection Algorithms], Mason McDaniel and M. Hossain Heydari, 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9, 2003.
+
[[Image:ext3 recovery.png|thumb|right|[[Helix3]]: damaged Ext3 recovery during the boot]]
  
; 2005
+
List of distributions that recover Ext3 (and sometimes Ext4) file systems during the boot:
  
* Fileprints: identifying file types by n-gram analysis, LiWei-Jen, Wang Ke, Stolfo SJ, Herzog B.., IProceeding of the 2005 IEEE workshop on information assurance, 2005. ([http://www.itoc.usma.edu/workshop/2005/Papers/Follow%20ups/FilePrintPresentation-final.pdf Presentation Slides])  ([http://www1.cs.columbia.edu/ids/publications/FilePrintPaper-revised.pdf PDF])
+
{| class="wikitable" border="1"
 +
|-
 +
!  Distribution
 +
!  Version
 +
|-
 +
  |  Helix3
 +
|  2009R1
 +
|-
 +
|  SMART Linux (Ubuntu)
 +
|  2010-01-20
 +
|-
 +
|  FCCU GNU/Linux Forensic Boot CD
 +
|  12.1
 +
|-
 +
|  SPADA
 +
|  4
 +
|}
  
* Douglas J. Hickok, Daine Richard Lesniak, Michael C. Rowe, File Type Detection Technology,  2005 Midwest Instruction and Computing Symposium.([http://www.micsymposium.org/mics_2005/papers/paper7.pdf PDF])
+
=== Root file system spoofing ===
  
; 2006
+
Most Ubuntu-based forensic Live CD distributions use Casper (a set of scripts used to complete initialization process during early stage of boot). Casper is responsible for searching for a root file system (typically, an image of live environment) on all supported devices (because a bootloader does not pass any information about device used for booting to the kernel), mounting it and executing ''/sbin/init'' program on a mounted root file system that will continue the boot process. Unfortunately, Casper was not designed to meet computer forensics requirements and is responsible for damaged Ext3/4 file systems recovery during the boot (see above) and root file system spoofing.
  
* Karresand Martin, Shahmehri Nahid [http://ieeexplore.ieee.org/iel5/10992/34632/01652088.pdf  File type identification of data fragments by their binary structure. ], Proceedings of the IEEE workshop on information assurance, pp.140–147, 2006.([http://www.itoc.usma.edu/workshop/2006/Program/Presentations/IAW2006-07-3.pdf Presentation Slides])
+
[[Image:Grml.png|thumb|right|[[grml]] mounted root file system from the [[hard drive]]]]
  
* Gregory A. Hall, Sliding Window Measurement for File Type Identification, Computer Forensics and Intrusion Analysis Group, ManTech Security and Mission Assurance, 2006. ([http://www.mantechcfia.com/SlidingWindowMeasurementforFileTypeIdentification.pdf PDF])
+
Currently, Casper may select fake root file system image on evidentiary media (e.g. [[HDD]]), because there are no authenticity checks performed (except optional UUID check for a possible live file system), and this fake root file system image may be used to execute malicious code during the boot with root privileges. Knoppix-based forensic Live CD distributions are vulnerable to the same attack.
  
* FORSIGS; Forensic Signature Analysis of the Hard Drive for Multimedia File Fingerprints, John Haggerty and Mark Taylor, IFIP TC11 International Information Security Conference, 2006, Sandton, South Africa.
+
List of Ubuntu-based distributions that allow root file system spoofing:
  
* Martin Karresand , Nahid Shahmehri, "Oscar -- Using Byte Pairs to Find File Type and Camera Make of Data Fragments," Annual Workshop on Digital Forensics and Incident Analysis, Pontypridd, Wales, UK, pp.85-94, Springer-Verlag, 2006.  
+
{| class="wikitable" border="1"
 +
|-
 +
!  Distribution
 +
!  Version
 +
!  Notes
 +
|-
 +
|  Helix3
 +
|  2009R1
 +
|
 +
|-
 +
|  Helix3 Pro
 +
|  2009R3
 +
|
 +
|-
 +
|  CAINE
 +
|  1.5
 +
|
 +
|-
 +
|  DEFT Linux
 +
|  5
 +
|
 +
|-
 +
|  Raptor
 +
|  20091026
 +
|
 +
|-
 +
|  grml
 +
|  2009.10
 +
|  Actually, [[grml]] uses live-initramfs scripts (Casper fork)
 +
|-
 +
|  BackTrack
 +
|  4
 +
|
 +
|-
 +
|  SMART Linux (Ubuntu)
 +
|  2010-01-20
 +
|
 +
|-
 +
|  FCCU GNU/Linux Forensic Boot CD
 +
|  12.1
 +
|
 +
|}
  
; 2007
+
Vulnerable Knoppix-based distributions include: SPADA, LinEn boot CD, BitFlare.
  
* Karresand M., Shahmehri N., [http://dx.doi.org/10.1007/0-387-33406-8_35 Oscar: File Type Identification of Binary Data in Disk Clusters and RAM Pages], Proceedings of IFIP International Information Security Conference: Security and Privacy in Dynamic Environments (SEC2006), Springer, ISBN 0-387-33405-x, pp.413-424, Karlstad, Sweden, May 2006.
+
=== Swap space activation ===
  
* Robert F. Erbacher and John Mulholland, "Identification and Localization of Data Types within Large-Scale File Systems," Proceedings of the 2nd International Workshop on Systematic Approaches to Digital Forensic Engineering, Seattle, WA, April 2007.
+
=== Incorrect mount policy ===
  
* Ryan M. Harris, "Using Artificial Neural Networks for Forensic File Type Identification," Master's Thesis, Purdue University, May 2007. ([https://www.cerias.purdue.edu/tools_and_resources/bibtex_archive/archive/2007-19.pdf PDF])
+
==== HAL ====
  
* Predicting the Types of File Fragments, William Calhoun, Drue Coles, DFRWS 2008. ([http://www.dfrws.org/2008/proceedings/p14-calhoun_pres.pdf Presentation Slides])  ([http://www.dfrws.org/2008/proceedings/p14-calhoun.pdf PDF])
+
==== rebuildfstab and scanpartition scripts ====
  
* Sarah J. Moody and Robert F. Erbacher, [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04545366 SÁDI – Statistical Analysis for Data type Identification], 3rd International Workshop on Systematic Approaches to Digital Forensic Engineering, 2008.
+
=== Incorrect write-blocking approach ===
  
; 2008
+
== See also ==
  
* Mehdi Chehel Amirani, Mohsen Toorani, and Ali Asghar Beheshti Shirazi, [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4625611 A New Approach to Content-based File Type Detection], Proceedings of the 13th IEEE Symposium on Computers and Communications (ISCC'08), pp.1103-1108, July 2008.  ([http://arxiv.org/ftp/arxiv/papers/1002/1002.3174.pdf PDF])
+
* [http://www.computer-forensics-lab.org/pdf/Linux_for_computer_forensic_investigators_2.pdf Linux for computer forensic investigators: problems of booting trusted operating system]
 
+
* [http://www.computer-forensics-lab.org/pdf/Linux_for_computer_forensic_investigators.pdf Linux for computer forensic investigators: «pitfalls» of mounting file systems]
; 2009
+
* Roussev, Vassil, and Garfinkel, Simson, "File Classification Fragment-The Case for Specialized Approaches," Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009), Oakland, California. ([http://simson.net/clips/academic/2009.SADFE.Fragments.pdf PDF])
+
 
+
* Irfan Ahmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong, [http://www.springerlink.com/content/g2655k2044615q75/ On Improving the Accuracy and Performance of Content-based File Type Identification], Proceedings of the 14th Australasian Conference on Information Security and Privacy (ACISP 2009), pp.44-59, LNCS (Springer), Brisbane, Australia, July 2009.
+
 
+
; 2010
+
*Irfan Ahmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong, [http://www.alphaminers.net/sub05/sub05_03.php?swf_pn=5&swf_sn=3&swf_pn2=3 Fast File-type Identification], Proceedings of the 25th ACM Symposium on Applied Computing (ACM SAC 2010), ACM, Sierre, Switzerland, March 2010.
+
[[Category:Bibliographies]]
+

Revision as of 17:34, 3 February 2010

The problem

Forensic Linux Live CD distributions are widely used during computer forensic investigations. Currently, many vendors of such Live CD distributions spread false claims that their distributions "do not touch anything", "write protect everything" and so on. Community-developed distributions are not exception here, unfortunately. Finally, it turns out that many forensic Linux Live CD distributions are not tested properly and there are no suitable test cases developed.

Another side of the problem

Another side of the problem of insufficient testing of forensic Live CD distributions is that many users do not know what happens "under the hood" of such distributions and cannot adequately test them.

Example

For example, Forensic Cop Journal (Volume 1(3), Oct 2009) describes a test case when an Ext3 file system was mounted using "-o ro" mount flag as a way to write protect the data. The article says that all tests were successful (i.e. no data modification was found after unmounting the file system), but it is known that damaged (i.e not properly unmounted) Ext3 file systems cannot be write protected using only "-o ro" mount flags (write access will be enabled during file system recovery).

And the question is: will many users test damaged Ext3 file system (together with testing the clean one) when validating their favourite forensic Live CD distribution? My answer is "no", because many users are unaware of such traits.

Problems

Here is a list of common problems of forensic Linux Live CD distributions that can be used by developers and users for testing purposes. Each problem is followed by an up to date list of distributions affected.

Journaling file systems updates

When mounting (and unmounting) several journaling file systems with only "-o ro" mount flag a different number of data writes may occur. Here is a list of such file systems:

File system When data writes happen Notes
Ext3 File system requires journal recovery To disable recovery: use "noload" flag, or use "ro,loop" flags, or use "ext2" file system type
Ext4 File system requires journal recovery To disable recovery: use "noload" flag, or use "ro,loop" flags, or use "ext2" file system type
ReiserFS File system has unfinished transactions "nolog" flag does not work (see man mount). To disable journal updates: use "ro,loop" flags
XFS Always "norecovery" flag does not help. To disable data writes: use "ro,loop" flags. The bug was fixed in recent 2.6 kernels.

Incorrect mount flags can be used to mount file systems on evidentiary media during the boot process or during the file system preview process. As described above, this may result in data writes to evidentiary media. For example, several Ubuntu-based forensic Live CD distributions mount Ext3/4 file systems on fixed media (e.g. hard drives) during execution of initrd scripts (these scripts mount every supported file system type on every supported media type using only "-o ro" flag in order to find a root file system image).

Helix3: damaged Ext3 recovery during the boot

List of distributions that recover Ext3 (and sometimes Ext4) file systems during the boot:

Distribution Version
Helix3 2009R1
SMART Linux (Ubuntu) 2010-01-20
FCCU GNU/Linux Forensic Boot CD 12.1
SPADA 4

Root file system spoofing

Most Ubuntu-based forensic Live CD distributions use Casper (a set of scripts used to complete initialization process during early stage of boot). Casper is responsible for searching for a root file system (typically, an image of live environment) on all supported devices (because a bootloader does not pass any information about device used for booting to the kernel), mounting it and executing /sbin/init program on a mounted root file system that will continue the boot process. Unfortunately, Casper was not designed to meet computer forensics requirements and is responsible for damaged Ext3/4 file systems recovery during the boot (see above) and root file system spoofing.

grml mounted root file system from the hard drive

Currently, Casper may select fake root file system image on evidentiary media (e.g. HDD), because there are no authenticity checks performed (except optional UUID check for a possible live file system), and this fake root file system image may be used to execute malicious code during the boot with root privileges. Knoppix-based forensic Live CD distributions are vulnerable to the same attack.

List of Ubuntu-based distributions that allow root file system spoofing:

Distribution Version Notes
Helix3 2009R1
Helix3 Pro 2009R3
CAINE 1.5
DEFT Linux 5
Raptor 20091026
grml 2009.10 Actually, grml uses live-initramfs scripts (Casper fork)
BackTrack 4
SMART Linux (Ubuntu) 2010-01-20
FCCU GNU/Linux Forensic Boot CD 12.1

Vulnerable Knoppix-based distributions include: SPADA, LinEn boot CD, BitFlare.

Swap space activation

Incorrect mount policy

HAL

rebuildfstab and scanpartition scripts

Incorrect write-blocking approach

See also