Difference between pages "Bulk extractor" and "FAT"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m (current version is 1.4.4)
 
(External links)
 
Line 1: Line 1:
== Overview ==
+
'''FAT''', or File Allocation Table, is a [[File Systems|file system]] that is designed to keep track of allocation status of clusters on a [[hard drive]].  Developed in 1977 by [[Microsoft]] Corporation, FAT was originally intended to be a [[File Systems|file system]] for the Microsoft Disk BASIC interpreter. FAT was quickly incorporated into an early version of Tim Patterson's QDOS, which was a moniker for "Quick and Dirty Operating System". [[Microsoft]] later purchased the rights to QDOS and released it under Microsoft branding as PC-DOS and later, MS-DOS.
'''bulk_extractor''' is a computer forensics tool that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures. The results can be easily inspected, parsed, or processed with automated tools. '''bulk_extractor''' also created a histograms of features that it finds, as features that are more common tend to be more important. The program can be used for law enforcement, defense, intelligence, and cyber-investigation applications.
+
  
bulk_extractor is distinguished from other forensic tools by its speed and thoroughness. Because it ignores file system structure, bulk_extractor can process different parts of the disk in parallel. In practice, the program splits the disk up into 16MiByte pages and processes one page on each available core. This means that 24-core machines process a disk roughly 24 times faster than a 1-core machine. bulk_extractor is also thorough. That’s because bulk_extractor automatically detects, decompresses, and recursively re-processes compressed data that is compressed with a variety of algorithms. Our testing has shown that there is a significant amount of compressed data in the unallocated regions of file systems that is missed by most forensic tools that are commonly in use today.
+
== Specification ==
  
Another advantage of ignoring file systems is that bulk_extractor can be used to process any digital media. We have used the program to process hard drives, SSDs, optical media, camera cards, cell phones, network packet dumps, and other kinds of digital information.
+
FAT is described by Microsoft in [[Media:Fatgen103.doc|Microsoft's FAT32 specification]]. Despite the name, the document includes descriptions of FAT12 and FAT16.
  
==Output Feature Files==
+
Closely related standards are: ECMA 107 and ISO/EIC 9293, which only cover FAT12 and FAT16, and also are somewhat more restricted than the file system described by Microsoft's document.
  
bulk_extractor now creates an output directory that includes:
+
== Structure==
* '''ccn.txt''' -- Credit card numbers
+
* '''ccn_track2.txt''' -- Credit card “track 2″ information
+
* '''domain.txt''' -- Internet domains found on the drive, including dotted-quad addresses found in text.
+
* '''email.txt''' -- Email addresses
+
* '''ether.txt''' -- Ethernet MAC addresses found through IP packet carving of swap files and compressed system hibernation files and file fragments.
+
* '''exif.txt''' -- EXIFs from JPEGs and video segments. This feature file contains all of the EXIF fields, expanded as XML records.
+
* '''find.txt''' -- The results of specific regular expression search requests.
+
* '''ip.txt''' -- IP addresses found through IP packet carving.
+
* '''telephone.txt''' --- US and international telephone numbers.
+
* '''url.txt''' --- URLs, typically found in browser caches, email messages, and pre-compiled into executables.
+
* '''url_searches.txt''' --- A histogram of terms used in Internet searches from services such as Google, Bing, Yahoo, and others.
+
* '''wordlist.txt''' --- :A list of all “words” extracted from the disk, useful for password cracking.
+
* '''wordlist_*.txt''' --- The wordlist with duplicates removed, formatted in a form that can be easily imported into a popular password-cracking program.
+
* '''zip.txt''' --- A file containing information regarding every ZIP file component found on the media. This is exceptionally useful as ZIP files contain internal structure and ZIP is increasingly the compound file format of choice for a variety of products such as Microsoft Office
+
  
For each of the above, two additional files may be created:
+
{| style="text-align:center;" cellpadding="3" border="1px"
* '''*_stopped.txt''' --- bulk_extractor supports a stop list, or a list of items that do not need to be brought to the user’s attention. However rather than simply suppressing this information, which might cause something critical to be hidden, stopped entries are stored in the stopped files.
+
| Boot sector
* '''*_histogram.txt''' --- bulk_extractor can also create histograms of features. This is important, as experience has shown that email addresses, domain names, URLs, and other information that appear more frequently on a hard drive or in a cell phone’s memory can be used to rapidly create a pattern of life report.
+
| More reserved<br/> sectors (optional)
 +
| FAT #1
 +
| FAT #2
 +
| Root directory<br /> (FAT12/16 only)
 +
| Data region<br /> (rest of disk)
 +
|}
  
Bulk extractor also creates a file that captures the provenance of the run:
+
=== Boot Record ===
;report.xml
+
When a computer is powered on, a POST (power-on self test) is performed, and control is then transferred to the [[Master boot record]] ([[MBR]]). The [[MBR]] is present no matter what file system is in use, and contains information about how the storage device is logically partitioned.  When using a FAT file system, the [[MBR]] hands off control of the computer to the Boot Record, which is the first sector on the partition. The Boot Record, which occupies a reserved area on the partition, contains executable code, in addition to information such as an OEM identifier, number of FATs, media descriptor (type of storage device), and information about the operating system to be booted. Once the Boot Record code executes, control is handed off to the operating system installed on that partition.
:A Digital Forensics XML report that includes information about the source media, how the bulk_extractor program was compiled and run, the time to process the digital evidence, and a meta report of the information that was found.
+
  
==Post-Processing==
+
=== FATs ===
 +
The primary task of the File Alocation Tables are to keep track of the allocation status of clusters, or logical groupings of sectors, on the disk drive.  There are four different possible FAT entries: allocated (along with the address of the next cluster associated with the file), unallocated, end of file, and bad sector.
  
We have developed four programs for post-processing the bulk_extractor output:
+
In order to provide redundancy in case of data corruption, two FATs, FAT1 and FAT2, are stored in the file system. FAT2 is a typically a duplicate of FAT1. However, FAT mirroring can be disabled on a FAT32 drive, thus enabling any of the FATs to become the Primary FAT. This possibly leaves FAT1 empty, which can be deceiving.
;bulk_diff.py
+
:This program reports the differences between two bulk_extractor runs. The intent is to image a computer, run bulk_extractor on a disk image, let the computer run for a period of time, re-image the computer, run bulk_extractor on the second image, and then report the differences. This can be used to infer the user’s activities within a time period.
+
;cda_tool.py
+
:This tool, currently under development, reads multiple bulk_extractor reports from multiple runs against multiple drives and performs a multi-drive correlation using Garfinkel’s Cross Drive Analysis technique. This can be used to automatically identify new social networks or to identify new members of existing networks.
+
;identify_filenames.py
+
:In the bulk_extractor feature file, each feature is annotated with the byte offset from the beginning of the image in which it was found. The program takes as input a bulk_extractor feature file and a DFXML file containing the locations of each file on the drive (produced with Garfinkel’s fiwalk program) and produces an annotated feature file that contains the offset, feature, and the file in which the feature was found.
+
;make_context_stop_list.py
+
:Although forensic analysts frequently make “stop lists”—for example, a lsit of email addresses that appear in the operating system and should therefore be ignored—such lists have a significant problem. Because it is relatively easy to get an email address into the binary of an open source application, ignoring all of these email addresses may make it possible to cloak email addresses from forensic analysis. Our solution is to create context-sensitive stop lists, in which the feature to be stopped is presented with the context in which it occures. The make_context_stop_list.py program takes the results of multiple bulk_extractor runs and creates a single context-sensitive stop list that can then be used to suppress features when found in a specific context. One such stop list constructed from Windows and Linux operating systems is available on the bulk extractor website.
+
  
== Download ==
+
=== Root Directory ===
The current version of '''bulk_extractor''' is 1.4.4.  
+
The Root Directory, sometimes referred to as the Root Folder, contains an entry for each file and directory stored in the file system.  This information includes the file name, starting cluster number, and file size. This information is changed whenever a file is created or subsequently modified. Root directory has a fixed size of 512 entries on a hard disk and the size on a floppy disk depends. With FAT32 it can be stored anywhere within the partition, although in previous versions it is always located immediately following the FAT region.
  
* Downloads are available at: http://digitalcorpora.org/downloads/bulk_extractor/
+
=== Data Area ===
* A WIndows installer with the GUI can be downloaded from: http://www.digitalcorpora.org/downloads/bulk_extractor/bulk_extractor-1.4.1-windowsinstaller.exe
+
  
== Bibliography ==
+
The Boot Record, FATs, and Root Directory are collectively referred to as the System Area. The remaining space on the logical drive is called the Data Area, which is where files are actually stored. It should be noted that when a file is deleted by the operating system, the data stored in the Data Area remains intact until it is overwritten.
=== Academic Publications ===
+
# Garfinkel, Simson, [http://simson.net/clips/academic/2013.COSE.bulk_extractor.pdf Digital media triage with bulk data analysis and bulk_extractor]. Computers and Security 32: 56-72 (2013)
+
# Beverly, Robert, Simson Garfinkel and Greg Cardwell, [http://simson.net/clips/academic/2011.DFRWS.ipcarving.pdf "Forensic Carving of Network Packets and Associated Data Structures"], DFRWS 2011, Aug. 1-3, 2011, New Orleans, LA. BEST PAPER AWARD (Acceptance rate: 23%, 14/62)
+
#Garfinkel, S., [http://simson.net/clips/academic/2006.DFRWS.pdf Forensic Feature Extraction and Cross-Drive Analysis,]The 6th Annual Digital Forensic Research Workshop Lafayette, Indiana, August 14-16, 2006. (Acceptance rate: 43%, 16/37)
+
  
===YouTube===
+
=== Clusters ===
'''[http://www.youtube.com/results?search_query=bulk_extractor search YouTube] for bulk_extractor videos'''
+
In order for FAT to manage files with satisfactory efficiency, it groups sectors into larger blocks referred to as clusters. A cluster is the smallest unit of disk space that can be allocated to a file, which is why clusters are often called allocation units. Each cluster can be used by one and only one resident file. Only the "data area" is divided into clusters, the rest of the partition is simply sectors. Cluster size is determined by the size of the disk volume and every file must be allocated an even number of clusters. Cluster sizing has a significant impact on performance and disk utilization. Larger cluster sizes result in more wasted space because files are less likely to fill up an even number of clusters.  
* [http://www.youtube.com/watch?v=odvDTGA7rYI Simson Garfinkel speaking at CERIAS about bulk_extractor]
+
* [http://www.youtube.com/watch?v=wTBHM9DeLq4 BackTrack 5 with bulk_extractor]
+
* [http://www.youtube.com/watch?v=QVfYOvhrugg Ubuntu 12.04 forensics with bulk_extractor]
+
* [http://www.youtube.com/watch?v=57RWdYhNvq8 Social Network forensics with bulk_extractor]
+
  
===Tutorials===
+
The size of one cluster is specified in the Boot Record and can range from a single sector (512 bytes) to 128 sectors (65536 bytes). The sectors in a cluster are continuous, therefore each cluster is a continuous block of space on the disk.  Note that only one file can be allocated to a cluster.  Therefore if a 1KB file is placed within a 32KB cluster there are 31KB of wasted space. The formula for determining clusters in a partition is (# of Sectors in Partition) - (# of Sectors per Fat * 2) - (# of Reserved Sectors) ) /  (# of Sectors per Cluster).
# [http://simson.net/ref/2012/2012-08-08%20bulk_extractor%20Tutorial.pdf Using bulk_extractor for digital forensics triage and cross-drive analysis], DFRWS 2012
+
 
 +
=== Wasted Sectors ===
 +
 
 +
'''Wasted Sectors''' (a.k.a. '''partition [[slack]]''') are a result of the number of data sectors not being evenly distributed by the cluster size. It's made up of unused bytes left at the end of a file. Also, if the partition as declared in the partition table is larger than what is claimed in the Boot Record the volume can be said to have wasted sectors. Small files on a hard drive are the reason for wasted space and the bigger the hard drive the more wasted space there is. 
 +
 
 +
=== FAT Entry Values ===
 +
<br>
 +
FAT12<br>
 +
<br>
 +
0x000          (Free Cluster)<br>   
 +
0x001          (Reserved Cluster)<br>
 +
0x002 - 0xFEF  (Used cluster; value points to next cluster)<br>
 +
0xFF0 - 0xFF6  (Reserved values)<br>
 +
0xFF7          (Bad cluster)<br>
 +
0xFF8 - 0xFFF  (Last cluster in file)<br>
 +
<br>
 +
FAT16<br>
 +
<br>
 +
0x0000          (Free Cluster)<br>
 +
0x0001          (Reserved Cluster)<br>
 +
0x0002 - 0xFFEF  (Used cluster; value points to next cluster)<br>
 +
0xFFF0 - 0xFFF6  (Reserved values)<br>
 +
0xFFF7          (Bad cluster)<br>
 +
0xFFF8 - 0xFFFF  (Last cluster in file)<br>
 +
<br>
 +
FAT32<br>
 +
<br>
 +
0x?0000000              (Free Cluster)<br>
 +
0x?0000001              (Reserved Cluster)<br>
 +
0x?0000002 - 0x?FFFFFEF  (Used cluster; value points to next cluster)<br>
 +
0x?FFFFFF0 - 0x?FFFFFF6  (Reserved values)<br>
 +
0x?FFFFFF7              (Bad cluster)<br>
 +
0x?FFFFFF8 - 0x?FFFFFFF  (Last cluster in file)
 +
 
 +
Note: FAT32 uses only 28 of 32 possible bits, the upper 4 bits should be left alone. Typically these bits are zero, and are represented above by a question mark (?).
 +
 
 +
==Versions==
 +
 
 +
There are three variants of FAT in existence: FAT12, FAT16, and FAT32.
 +
 
 +
=== FAT12 ===
 +
*  FAT12 is the oldest type of FAT that uses a 12 bit file allocation table entry. 
 +
*  FAT12 can hold a max of 4,084 clusters (which is 2<sup>12</sup> clusters minus a few values that are reserved for values used in  the FAT). 
 +
*  It is used for floppy disks and hard drive partitions that are smaller than 16 MB. 
 +
*  All 1.44 MB 3.5" floppy disks are formatted using FAT12.
 +
*  Cluster size that is used is between 0.5 KB to 4 KB.
 +
 
 +
=== FAT16 ===
 +
*  It is called FAT16 because all entries are 16 bit.
 +
*  FAT16 can hold a max of 65,524 addressable units
 +
*  It is used for small and moderate sized hard disk volumes.
 +
 
 +
=== FAT32 ===
 +
FAT32 is the enhanced version of the FAT system implemented beginning with Windows 95 OSR2, Windows 98, and Windows Me.
 +
Features include:
 +
*  Drives of up to 2 terabytes are supported ([[Windows]] 2000 only supports up to 32 gigabytes)
 +
*  Since FAT32 uses smaller clusters (of 4 kilobytes each), it uses hard drive space more efficiently. This is a 10 to 15 percent improvement over FAT or FAT16.
 +
*  The limitations of FAT or FAT 16 on the number of root folder entries have been eliminated. In FAT32, the root folder is an ordinary cluster chain, and can be located anywhere on the drive.
 +
*  File allocation mirroring can be disabled in FAT32. This allows a different copy of the file allocation table then the default to be active.
 +
 
 +
==== Limitations with [[Windows]] 2000 & [[Windows]] XP ====
 +
* Clusters cannot be 64KB or larger.
 +
* Cannot decrease cluster size that will result in the the FAT being larger than 16 MB minus 64KB in size.
 +
* Cannot contain fewer than 65,527 clusters.
 +
* Maximum of 32KB per cluster.
 +
* ''[[Windows]] XP'': The Windows XP installation program will not allow a user to format a drive of more than 32GB using the FAT32 file system. Using the installation program, the only way to format a disk greater than 32GB in size is to use NTFS. A disk larger than 32GB in size ''can'' be formatted with FAT32 for use with Windows XP if the system is booted from a Windows 98 or Windows ME startup disk, and formatted using the tool that will be on the disk.
 +
 
 +
=== exFAT (sometimes incorrectly called FAT64) ===
 +
exFAT (also know as Extended File Allocation Table or exFAT) is Microsoft's latest version of FAT and works with Windows Embedded CE 6.0, Windows XP/Server 2003 (with a KB patch, Vista/Server 2008 SP 1 & Later, and Windows 7.
 +
Features include:
 +
*  Largest file size is 2<sup>64</sup> bytes (16 exabytes) vs. FAT32's maximum file size of 4GB.
 +
*  Has transaction support using Transaction-Safe Extended FAT File System (TexFAT). (Not released yet in Desktop/Server OS)
 +
*  Speeds up storage allocation processes by using free space bitmaps.
 +
*  Support UTC timestamps (Vista/Server 2008 SP1 does not support UTC, UTC support came out with SP2)
 +
*  Maximum Cluster size of 32MB (Fat32 is 32KB)
 +
*  Sector sizes from 512 bytes to 4096 bytes in size
 +
*  Maximum FAT supportable volume size of 128PB
 +
*  Maximum Subdirectory size of 256MB which can support up to over 2 million files in a singlr subdirectory
 +
*  Uses a Bitmap for cluster allocation
 +
*  Supports File Permissions (Not released yet in Desktop/Server OS)
 +
*  Has been selected as the exclusive file system of the SDXC memory card by the SD Association
 +
 
 +
Although Microsoft has published some information on exFAT, there are more technical specifications available from third parties. For example, here is a  [http://paradigmsolutions.files.wordpress.com/2009/12/exfat-excerpt-1-4.pdf detailed presentation on exFAT].
 +
 
 +
Another published technical paper that goes in the internals in great detail is in the SANS Reading Room at: [http://www.sans.org/reading_room/whitepapers/forensics/rss/reverse_engineering_the_microsoft_exfat_file_system_33274 Reverse Engineering the Microsoft exFAT File System]
 +
 
 +
=== Comparison of FAT Versions ===
 +
 
 +
See the table at http://en.wikipedia.org/wiki/File_Allocation_Table for more detailed information about the various versions of FAT.
 +
 
 +
== Uses ==
 +
Due to its low cost, mobility, and non-volatile nature, flash memory has quickly become the choice medium for storing and transferring data in consumer electronic devices. The majority of flash memory storage is formatted using the FAT file system.  In addition, FAT is also frequently used in electronic devices with miniature hard drives.
 +
 
 +
Examples of devices in which FAT is utilized include:
 +
 
 +
* [[USB]] thumb drives
 +
* [[Digital camera|Digital cameras]]
 +
* Digital camcorders
 +
* Portable audio and video players
 +
* Multifunction [[printers]]
 +
* Electronic photo frames
 +
* Electronic musical instruments
 +
* Standard televisions
 +
* [[PDAs]]
 +
 
 +
==Data Recovery==
 +
Recovering directory entries from FAT filesystems as part of [[recovering deleted data]] can be accomplished by looking for entries that begin with a sigma 0xe5. When a file or directory is deleted under a FAT filesystem, the first character of its name is changed to sigma. The remainder of the directory entry information remains intact.
 +
 
 +
The pointers are also changed to zero for each cluster used by the file.  Recovery tools look at the FAT to find the entry for the file.  The location of the starting cluster will still be in the directory file.  It is not deleted or modified.  The tool will go straight to that cluster and try to recover the file using the file size to determine the number of clusters to recover.  Some tools will go to the starting cluster and recover the next "X" number of clusters needed for the specific file size.  However, this tool is not ideal.  An ideal tool will locate "X" number of available clusters.  Since files are most often fragmented, this will be a more precise way to recover the file.
 +
 
 +
An issue arises when two files in the same row of clusters are deleted.  If the clusters are not in sequential order, the tool will automatically receive "X" number of clusters.  However, because the file was fragmented, it's most likely that all the clusters obtained will not all contain data for that file.  If these two deleted files are in the same row of clusters, it is highly unlikely the file can be recovered.
 +
 
 +
==File [[Slack]]==
 +
File [[slack]] is data that starts from the end of the file written and continues to the end of the sectors designated to the file. There are two types of file [[slack]], RAM slack and Residual [[slack]]. RAM slack starts from the end of the file and goes to the end of that sector. Residual slack then starts at the next sector and goes to the end of the cluster allocated for the file.  File slack is a helpful tool when analyzing a hard drive because the old data that is not overwritten by the new file is still in tact. Go to http://www.pcguide.com/ref/hdd/file/partSizes-c.html for examples.
 +
 
 +
 
 +
<table border="1" cellspacing="2" bordercolor="#000000" cellpadding="4" width="468" bordercolorlight="#C0C0C0">
 +
  <tr>
 +
    <td width="101" bgcolor="#808080"><font size="2"><b><center>Cluster</center></b></font></td>
 +
    <td width="177" bgcolor="#808080"><font size="2"><b><center>Sample Slack Space,
 +
    50% Cluster Slack Per File</center></b></font></td>
 +
    <td width="178" bgcolor="#808080"><font size="2"><b><center>Sample Slack Space,
 +
    67% Cluster Slack Per File</center></b></font></td>
 +
  </tr>
 +
  <tr>
 +
    <td width="101" bgcolor="#C0C0C0"><font size="2"><b><center>2 kiB</center></b></font></td>
 +
    <td width="177"><font size="2"><center>17 MB</center></font></td>
 +
    <td width="178"><font size="2"><center>22 MB</center></font></td>
 +
  </tr>
 +
  <tr>
 +
    <td width="101" bgcolor="#C0C0C0"><font size="2"><b><center>4 kiB</center></b></font></td>
 +
    <td width="177"><font size="2"><center>33 MB</center></font></td>
 +
    <td width="178"><font size="2"><center>44 MB</center></font></td>
 +
  </tr>
 +
  <tr>
 +
    <td width="101" bgcolor="#C0C0C0"><font size="2"><b><center>8 kiB</center></b></font></td>
 +
    <td width="177"><font size="2"><center>66 MB</center></font></td>
 +
    <td width="178"><font size="2"><center>89 MB</center></font></td>
 +
  </tr>
 +
  <tr>
 +
    <td width="101" bgcolor="#C0C0C0"><font size="2"><b><center>16 kiB</center></b></font></td>
 +
    <td width="177"><font size="2"><center>133 MB</center></font></td>
 +
    <td width="178"><font size="2"><center>177 MB</center></font></td>
 +
  </tr>
 +
  <tr>
 +
    <td width="101" bgcolor="#C0C0C0"><font size="2"><b><center>32 kiB</center></b></font></td>
 +
    <td width="177"><font size="2"><center>265 MB</center></font></td>
 +
    <td width="178"><font size="2"><center>354 MB</center></font></td>
 +
  </tr>
 +
</table>
 +
 
 +
The diagram above demonstrates the larger the cluster size used, the more disk space is wasted due to slack. This suggests it is better to use smaller cluster sizes whenever possible.
 +
 
 +
==FAT Advantages==
 +
*  Files available to multiple operating systems on the same computer
 +
*  Easier to switch from FAT to [[NTFS]] than vice versa
 +
*  Performs faster on smaller volumes (< 10GB)
 +
*  Does not index files, which causes slightly higher performance
 +
*  Performs better with small cache sizes (< 96MB)
 +
*  More space-efficient on small volumes (< 4GB)
 +
*  Performs better with slow disks (< 5400RPM)
 +
 
 +
==FAT Disadvantages==
 +
*  FAT has a fixed maximum number of clusters per partition, which means as the hard disk gets bigger the size of each cluster must increase, creating more slack space
 +
*  Doesn't natively support many abilities of [[NTFS]] such as on-the-fly compression, [[encryption]], or advanced security using access control lists
 +
*  [[NTFS]] recommended by [[Microsoft]] for volumes larger than 32GB
 +
*  FAT slows down as the number of files on the disk increases
 +
*  FAT usually fragments files more
 +
*  FAT does not allow for indexing of files for faster searching
 +
*  FAT does not support user quotas
 +
*  FAT has minimal security features including no access control list (ACL) capability.
 +
 
 +
== External links ==
 +
* http://en.wikipedia.org/wiki/File_Allocation_Table
 +
* http://www.microsoft.com
 +
* http://www.ntfs.com
 +
* http://www.ntfs.com/ntfs_vs_fat.htm
 +
* http://support.microsoft.com/kb/q154997/#XSLTH3126121123120121120120
 +
* http://www.dewassoc.com/kbase/hard_drives/boot_sector.htm
 +
* http://home.teleport.com/~brainy/fat32.htm
 +
* http://www2.tech.purdue.edu/cpt/courses/cpt499s/
 +
* http://home.no.net/tkos/info/fat.html
 +
* http://web.ukonline.co.uk/cook/fat32.htm
 +
* http://www.ntfs.com/fat-systems.htm
 +
* http://www.microsoft.com/whdc/system/platform/firmware/fatgen.mspx
 +
* http://support.microsoft.com/kb/q140418
 +
 
 +
=== ExFAT ===
 +
* [http://en.wikipedia.org/wiki/ExFAT Wikipedia: ExFAT]
 +
* [http://www.active-undelete.com/xfat_volume.htm exFAT File System]
 +
* [http://www.sans.org/reading-room/whitepapers/forensics/reverse-engineering-microsoft-exfat-file-system-33274 Reverse Engineering the Microsoft exFAT File System], by [[Robert Shullich]], December 1, 2009
 +
* [http://paradigmsolutions.files.wordpress.com/2009/12/exfat-excerpt-1-4.pdf Extended FAT file system], by [[Jeff Hamm]], December 2009
 +
* [http://www.slideshare.net/overcertified/demystifying-the-microsoft-extended-fat-file-system-exfat Demystifying the Microsoft Extended FAT File System (exFAT)], by [[Robert Shullich]], September 20, 2010
 +
* [http://aut.researchgateway.ac.nz/bitstream/handle/10292/4123/LeY.pdf Windows Phone 7 : Implications For Digital Forensic Investigators], by [[Yung Anh Le]], 2012
 +
 
 +
=== textFAT ===
 +
* [http://msdn.microsoft.com/en-us/library/ee490643(v=winembedded.60).aspx TexFAT Overview (Windows Embedded CE 6.0)], by [[Microsoft]]
 +
* [http://www.ntfs.com/exfat-textFAT-padding.htm TexFAT Padding Directory Entry]
 +
 
 +
== Tools ==
 +
=== exFAT ===
 +
* [http://code.google.com/p/exfat/ Open Source exFAT file system implementation]
 +
 
 +
[[Category:File Systems]]

Revision as of 01:00, 23 June 2014

FAT, or File Allocation Table, is a file system that is designed to keep track of allocation status of clusters on a hard drive. Developed in 1977 by Microsoft Corporation, FAT was originally intended to be a file system for the Microsoft Disk BASIC interpreter. FAT was quickly incorporated into an early version of Tim Patterson's QDOS, which was a moniker for "Quick and Dirty Operating System". Microsoft later purchased the rights to QDOS and released it under Microsoft branding as PC-DOS and later, MS-DOS.

Specification

FAT is described by Microsoft in Microsoft's FAT32 specification. Despite the name, the document includes descriptions of FAT12 and FAT16.

Closely related standards are: ECMA 107 and ISO/EIC 9293, which only cover FAT12 and FAT16, and also are somewhat more restricted than the file system described by Microsoft's document.

Structure

Boot sector More reserved
sectors (optional)
FAT #1 FAT #2 Root directory
(FAT12/16 only)
Data region
(rest of disk)

Boot Record

When a computer is powered on, a POST (power-on self test) is performed, and control is then transferred to the Master boot record (MBR). The MBR is present no matter what file system is in use, and contains information about how the storage device is logically partitioned. When using a FAT file system, the MBR hands off control of the computer to the Boot Record, which is the first sector on the partition. The Boot Record, which occupies a reserved area on the partition, contains executable code, in addition to information such as an OEM identifier, number of FATs, media descriptor (type of storage device), and information about the operating system to be booted. Once the Boot Record code executes, control is handed off to the operating system installed on that partition.

FATs

The primary task of the File Alocation Tables are to keep track of the allocation status of clusters, or logical groupings of sectors, on the disk drive. There are four different possible FAT entries: allocated (along with the address of the next cluster associated with the file), unallocated, end of file, and bad sector.

In order to provide redundancy in case of data corruption, two FATs, FAT1 and FAT2, are stored in the file system. FAT2 is a typically a duplicate of FAT1. However, FAT mirroring can be disabled on a FAT32 drive, thus enabling any of the FATs to become the Primary FAT. This possibly leaves FAT1 empty, which can be deceiving.

Root Directory

The Root Directory, sometimes referred to as the Root Folder, contains an entry for each file and directory stored in the file system. This information includes the file name, starting cluster number, and file size. This information is changed whenever a file is created or subsequently modified. Root directory has a fixed size of 512 entries on a hard disk and the size on a floppy disk depends. With FAT32 it can be stored anywhere within the partition, although in previous versions it is always located immediately following the FAT region.

Data Area

The Boot Record, FATs, and Root Directory are collectively referred to as the System Area. The remaining space on the logical drive is called the Data Area, which is where files are actually stored. It should be noted that when a file is deleted by the operating system, the data stored in the Data Area remains intact until it is overwritten.

Clusters

In order for FAT to manage files with satisfactory efficiency, it groups sectors into larger blocks referred to as clusters. A cluster is the smallest unit of disk space that can be allocated to a file, which is why clusters are often called allocation units. Each cluster can be used by one and only one resident file. Only the "data area" is divided into clusters, the rest of the partition is simply sectors. Cluster size is determined by the size of the disk volume and every file must be allocated an even number of clusters. Cluster sizing has a significant impact on performance and disk utilization. Larger cluster sizes result in more wasted space because files are less likely to fill up an even number of clusters.

The size of one cluster is specified in the Boot Record and can range from a single sector (512 bytes) to 128 sectors (65536 bytes). The sectors in a cluster are continuous, therefore each cluster is a continuous block of space on the disk. Note that only one file can be allocated to a cluster. Therefore if a 1KB file is placed within a 32KB cluster there are 31KB of wasted space. The formula for determining clusters in a partition is (# of Sectors in Partition) - (# of Sectors per Fat * 2) - (# of Reserved Sectors) ) / (# of Sectors per Cluster).

Wasted Sectors

Wasted Sectors (a.k.a. partition slack) are a result of the number of data sectors not being evenly distributed by the cluster size. It's made up of unused bytes left at the end of a file. Also, if the partition as declared in the partition table is larger than what is claimed in the Boot Record the volume can be said to have wasted sectors. Small files on a hard drive are the reason for wasted space and the bigger the hard drive the more wasted space there is.

FAT Entry Values


FAT12

0x000 (Free Cluster)
0x001 (Reserved Cluster)
0x002 - 0xFEF (Used cluster; value points to next cluster)
0xFF0 - 0xFF6 (Reserved values)
0xFF7 (Bad cluster)
0xFF8 - 0xFFF (Last cluster in file)

FAT16

0x0000 (Free Cluster)
0x0001 (Reserved Cluster)
0x0002 - 0xFFEF (Used cluster; value points to next cluster)
0xFFF0 - 0xFFF6 (Reserved values)
0xFFF7 (Bad cluster)
0xFFF8 - 0xFFFF (Last cluster in file)

FAT32

0x?0000000 (Free Cluster)
0x?0000001 (Reserved Cluster)
0x?0000002 - 0x?FFFFFEF (Used cluster; value points to next cluster)
0x?FFFFFF0 - 0x?FFFFFF6 (Reserved values)
0x?FFFFFF7 (Bad cluster)
0x?FFFFFF8 - 0x?FFFFFFF (Last cluster in file)

Note: FAT32 uses only 28 of 32 possible bits, the upper 4 bits should be left alone. Typically these bits are zero, and are represented above by a question mark (?).

Versions

There are three variants of FAT in existence: FAT12, FAT16, and FAT32.

FAT12

  • FAT12 is the oldest type of FAT that uses a 12 bit file allocation table entry.
  • FAT12 can hold a max of 4,084 clusters (which is 212 clusters minus a few values that are reserved for values used in the FAT).
  • It is used for floppy disks and hard drive partitions that are smaller than 16 MB.
  • All 1.44 MB 3.5" floppy disks are formatted using FAT12.
  • Cluster size that is used is between 0.5 KB to 4 KB.

FAT16

  • It is called FAT16 because all entries are 16 bit.
  • FAT16 can hold a max of 65,524 addressable units
  • It is used for small and moderate sized hard disk volumes.

FAT32

FAT32 is the enhanced version of the FAT system implemented beginning with Windows 95 OSR2, Windows 98, and Windows Me. Features include:

  • Drives of up to 2 terabytes are supported (Windows 2000 only supports up to 32 gigabytes)
  • Since FAT32 uses smaller clusters (of 4 kilobytes each), it uses hard drive space more efficiently. This is a 10 to 15 percent improvement over FAT or FAT16.
  • The limitations of FAT or FAT 16 on the number of root folder entries have been eliminated. In FAT32, the root folder is an ordinary cluster chain, and can be located anywhere on the drive.
  • File allocation mirroring can be disabled in FAT32. This allows a different copy of the file allocation table then the default to be active.

Limitations with Windows 2000 & Windows XP

  • Clusters cannot be 64KB or larger.
  • Cannot decrease cluster size that will result in the the FAT being larger than 16 MB minus 64KB in size.
  • Cannot contain fewer than 65,527 clusters.
  • Maximum of 32KB per cluster.
  • Windows XP: The Windows XP installation program will not allow a user to format a drive of more than 32GB using the FAT32 file system. Using the installation program, the only way to format a disk greater than 32GB in size is to use NTFS. A disk larger than 32GB in size can be formatted with FAT32 for use with Windows XP if the system is booted from a Windows 98 or Windows ME startup disk, and formatted using the tool that will be on the disk.

exFAT (sometimes incorrectly called FAT64)

exFAT (also know as Extended File Allocation Table or exFAT) is Microsoft's latest version of FAT and works with Windows Embedded CE 6.0, Windows XP/Server 2003 (with a KB patch, Vista/Server 2008 SP 1 & Later, and Windows 7. Features include:

  • Largest file size is 264 bytes (16 exabytes) vs. FAT32's maximum file size of 4GB.
  • Has transaction support using Transaction-Safe Extended FAT File System (TexFAT). (Not released yet in Desktop/Server OS)
  • Speeds up storage allocation processes by using free space bitmaps.
  • Support UTC timestamps (Vista/Server 2008 SP1 does not support UTC, UTC support came out with SP2)
  • Maximum Cluster size of 32MB (Fat32 is 32KB)
  • Sector sizes from 512 bytes to 4096 bytes in size
  • Maximum FAT supportable volume size of 128PB
  • Maximum Subdirectory size of 256MB which can support up to over 2 million files in a singlr subdirectory
  • Uses a Bitmap for cluster allocation
  • Supports File Permissions (Not released yet in Desktop/Server OS)
  • Has been selected as the exclusive file system of the SDXC memory card by the SD Association

Although Microsoft has published some information on exFAT, there are more technical specifications available from third parties. For example, here is a detailed presentation on exFAT.

Another published technical paper that goes in the internals in great detail is in the SANS Reading Room at: Reverse Engineering the Microsoft exFAT File System

Comparison of FAT Versions

See the table at http://en.wikipedia.org/wiki/File_Allocation_Table for more detailed information about the various versions of FAT.

Uses

Due to its low cost, mobility, and non-volatile nature, flash memory has quickly become the choice medium for storing and transferring data in consumer electronic devices. The majority of flash memory storage is formatted using the FAT file system. In addition, FAT is also frequently used in electronic devices with miniature hard drives.

Examples of devices in which FAT is utilized include:

  • USB thumb drives
  • Digital cameras
  • Digital camcorders
  • Portable audio and video players
  • Multifunction printers
  • Electronic photo frames
  • Electronic musical instruments
  • Standard televisions
  • PDAs

Data Recovery

Recovering directory entries from FAT filesystems as part of recovering deleted data can be accomplished by looking for entries that begin with a sigma 0xe5. When a file or directory is deleted under a FAT filesystem, the first character of its name is changed to sigma. The remainder of the directory entry information remains intact.

The pointers are also changed to zero for each cluster used by the file. Recovery tools look at the FAT to find the entry for the file. The location of the starting cluster will still be in the directory file. It is not deleted or modified. The tool will go straight to that cluster and try to recover the file using the file size to determine the number of clusters to recover. Some tools will go to the starting cluster and recover the next "X" number of clusters needed for the specific file size. However, this tool is not ideal. An ideal tool will locate "X" number of available clusters. Since files are most often fragmented, this will be a more precise way to recover the file.

An issue arises when two files in the same row of clusters are deleted. If the clusters are not in sequential order, the tool will automatically receive "X" number of clusters. However, because the file was fragmented, it's most likely that all the clusters obtained will not all contain data for that file. If these two deleted files are in the same row of clusters, it is highly unlikely the file can be recovered.

File Slack

File slack is data that starts from the end of the file written and continues to the end of the sectors designated to the file. There are two types of file slack, RAM slack and Residual slack. RAM slack starts from the end of the file and goes to the end of that sector. Residual slack then starts at the next sector and goes to the end of the cluster allocated for the file. File slack is a helpful tool when analyzing a hard drive because the old data that is not overwritten by the new file is still in tact. Go to http://www.pcguide.com/ref/hdd/file/partSizes-c.html for examples.


Cluster
Sample Slack Space, 50% Cluster Slack Per File
Sample Slack Space, 67% Cluster Slack Per File
2 kiB
17 MB
22 MB
4 kiB
33 MB
44 MB
8 kiB
66 MB
89 MB
16 kiB
133 MB
177 MB
32 kiB
265 MB
354 MB

The diagram above demonstrates the larger the cluster size used, the more disk space is wasted due to slack. This suggests it is better to use smaller cluster sizes whenever possible.

FAT Advantages

  • Files available to multiple operating systems on the same computer
  • Easier to switch from FAT to NTFS than vice versa
  • Performs faster on smaller volumes (< 10GB)
  • Does not index files, which causes slightly higher performance
  • Performs better with small cache sizes (< 96MB)
  • More space-efficient on small volumes (< 4GB)
  • Performs better with slow disks (< 5400RPM)

FAT Disadvantages

  • FAT has a fixed maximum number of clusters per partition, which means as the hard disk gets bigger the size of each cluster must increase, creating more slack space
  • Doesn't natively support many abilities of NTFS such as on-the-fly compression, encryption, or advanced security using access control lists
  • NTFS recommended by Microsoft for volumes larger than 32GB
  • FAT slows down as the number of files on the disk increases
  • FAT usually fragments files more
  • FAT does not allow for indexing of files for faster searching
  • FAT does not support user quotas
  • FAT has minimal security features including no access control list (ACL) capability.

External links

ExFAT

textFAT

Tools

exFAT