Difference between pages "Email Headers" and "File Carving"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(Mail User Agents: - Cleaned up text)
 
(Memory Carving)
 
Line 1: Line 1:
'''Email Headers''' are lines of [[metadata]] attached to each [[email]] that contain lots of useful information for a [[forensic investigator]]. However, email headers can be easily forged, so they should never be used as the only source of information.
+
'''File Carving,''' or sometimes simply '''Carving,''' is the practice of searching an input for files or other kinds of objects based on content, rather than on metadata. File carving is a powerful tool for recovering files and fragments of files when directory entries are corrupt or missing, as may be the case with old files that have been deleted or when performing an analysis on damaged media. Memory carving is a useful tool for analyzing physical and virtual memory dumps when the memory structures are unknown or have been overwritten.
  
== Making Sense of Headers ==
 
  
There is no single way to make sense of email headers. Some examiners favor reading from the bottom up, some favor reading from the top down. Because information in the headers can be put there by the user's [[Mail User Agent|MUA]], a server in transit, or the recipient's [[Mail User Agent|MUA]], it can be difficult to determine when a line was added.
+
Most file carvers operate by looking for file headers and/or footers, and then "carving out" the blocks between these two boundaries. [[Semantic Carving]] performs carving based on an analysis of the contents of the proposed files.  
  
=== Mail User Agents ===
+
File carving should be done on a [[disk image]], rather than on the original disk.
{{main|List of MUA Header Formats}}
+
Every [[Mail User Agent|MUA]] sets up the headers for a message slightly differently. Although some headers are required under the applicable [http://www.faqs.org/rfcs/rfc2822.html RFC], their format and ordering can vary by client. Almost all clients, however, add their headers in a fixed format and order.
+
The examiner can use the format and order for each client to show that messages were forged, but not that they were legitimate. For example, if a message purports to be from [[Apple Mail]] but the order or the headers do not match the [[Apple Mail Header Format]], the message has been forged. If the headers of the message do match that format, however, it does not guarantee that the message was sent by that program.
+
  
=== Servers in Transit ===
+
File carving tools are listed on the [[Tools:Data_Recovery]] wiki page.
  
Mail servers can add lines onto email headers, usually in the form of "Received" lines, like this:
+
Many carving programs have an option to only look at or near sector boundaries where headers are found. However, searching the entire input can find files that have been embedded into other files, such as [[JPEG]]s being embedded into [[Microsoft]] [[DOC|Word documents]]. This may be considered an advantage or a disadvantage, depending on the circumstances.
<pre>Received: by servername.recipeienthost.com (Postfix, from userid 506)
+
id 77C30808A; Sat, 24 Feb 2007 20:43:56 -0500 (EST)</pre>
+
  
== Message Id Field ==
+
The majority of file carving programs will only recover files that are contiguous on the media (in other words files that are not fragmented).
  
According to the current guidelines for email [http://www.faqs.org/rfcs/rfc2822.html RFC 2822]), every email should have a Message-ID field:
+
== Fragmented File Recovery ==
<pre>  The "Message-ID:" field provides a unique message identifier that
+
[[Simson Garfinkel]] estimated that upto 58% of outlook, 17% of jpegs and 16% of MS-Word files are fragmented and, therefore, appear corrupted or missing to a user using traditional data carving. The first set of file carving programs that can handle fragmented files automatically have finally arrived.  
  refers to a particular version of a particular message. The
+
[[User:PashaPal|A. Pal]], [[User:NasirMemon|N. Memon]]. T. Sencar and K. Shanmugasundaram have introduced a technique called [[File_Carving:SmartCarving|SmartCarving]] that can recover fragmented files.
  uniqueness of the message identifier is guaranteed by the host that
+
  generates it (see below). This message identifier is intended to be
+
  machine readable and not necessarily meaningful to humans. A message
+
  identifier pertains to exactly one instantiation of a particular
+
  message; subsequent revisions to the message each receive new message
+
  identifiers.
+
  
  ...
+
== File Carving Taxonomy==
 +
[[Simson Garfinkel]] and [[Joachim Metz]] have proposed the following file carving taxonomy:
  
  The message identifier (msg-id) itself MUST be a globally unique
+
;Carving
  identifier for a message.  The generator of the message identifier
+
:General term for extracting data (files) out of undifferentiated blocks (raw data), like "carving" a sculpture out of soap stone.  
  MUST guarantee that the msg-id is unique.  There are several
+
  algorithms that can be used to accomplish this.  Since the msg-id has
+
  a similar syntax to angle-addr (identical except that comments and
+
  folding white space are not allowed), a good method is to put the
+
  domain name (or a domain literal IP address) of the host on which the
+
  message identifier was created on the right hand side of the "@", and
+
  put a combination of the current absolute date and time along with
+
  some other currently unique (perhaps sequential) identifier available
+
  on the system (for example, a process id number) on the left hand
+
  side.  Using a date on the left hand side and a domain name or domain
+
  literal on the right hand side makes it possible to guarantee
+
  uniqueness since no two hosts use the same domain name or IP address
+
  at the same time.  Though other algorithms will work, it is
+
  RECOMMENDED that the right hand side contain some domain identifier
+
  (either of the host itself or otherwise) such that the generator of
+
  the message identifier can guarantee the uniqueness of the left hand
+
  side within the scope of that domain.</pre>
+
  
Where known, the Message-ID algorithms for known programs are given on the separate pages for those programs.
+
;Block-Based Carving
 +
:Any carving method (algorithm) that analyzes the input on block-by-block basis to determine if a block is part of a possible output file. This method assumes that each block can only be part of a single file (or embedded file).
  
== Sample Header ==
+
;Statistical Carving
 +
:Any carving method (algorithm) that analyzes the input on characteristic or statistic for example, entropy) to determine if the input is part of a possible output file.
  
This is an (incomplete) excerpt from an email header:
+
;Header/Footer Carving
 +
:A method for carving files out of raw data using a distinct header (start of file marker) and footer (end of file marker).
  
Received: from lists.securityfocus.com (lists.securityfocus.com [205.206.231.19])
+
;Header/Maximum (file) size Carving
        by outgoing2.securityfocus.com (Postfix) with QMQP
+
:A method for carving files out of raw data using a distinct header (start of file marker) and a maximum (file) size. This approach works because many file formats (e.g. JPEG, MP3) do not care if additional junk is appended to the end of a valid file.
        id 7E9971460C9; Mon,  9 Jan 2006 08:01:36 -0700 (MST)
+
Mailing-List: contact forensics-help@securityfocus.com; run by ezmlm
+
Precedence: bulk
+
List-Id: <forensics.list-id.securityfocus.com>
+
List-Post: <mailto:forensics@securityfocus.com>
+
List-Help: <mailto:forensics-help@securityfocus.com>
+
List-Unsubscribe: <mailto:forensics-unsubscribe@securityfocus.com>
+
List-Subscribe: <mailto:forensics-subscribe@securityfocus.com>
+
Delivered-To: mailing list forensics@securityfocus.com
+
Delivered-To: moderator for forensics@securityfocus.com
+
Received: (qmail 20564 invoked from network); 5 Jan 2006 16:11:57 -0000
+
From: YJesus <yjesus@security-projects.com>
+
To: forensics@securityfocus.com
+
Subject: New Tool : Unhide
+
User-Agent: KMail/1.9
+
MIME-Version: 1.0
+
Content-Disposition: inline
+
Date: Thu, 5 Jan 2006 16:41:30 +0100
+
Content-Type: text/plain;
+
  charset="iso-8859-1"
+
Content-Transfer-Encoding: quoted-printable
+
Message-Id: <200601051641.31830.yjesus@security-projects.com>
+
X-HE-Spam-Level: /
+
X-HE-Spam-Score: 0.0
+
X-HE-Virus-Scanned: yes
+
Status: RO
+
Content-Length: 586
+
Lines: 26
+
  
== External Links ==
+
;Header/Embedded Length Carving
 +
:A method for carving files out of raw data using a distinct header and a file length (size) which is embedded in the file format
  
* http://en.wikipedia.org/wiki/Computer_forensics#E-mail_Headers
+
;File structure based Carving
* http://www.forensictracer.com software for forensic analysis of internet resources
+
:A method for carving files out of raw data using a certain level of knowledge of the internal structure of file types. Garfinkel called this approach "Semantic Carving" in his DFRWS2006 carving challenge submission, while Metz and Mora called the approach "Deep Carving."
 +
 
 +
;Semantic Carving
 +
:A method for carving files based on a linguistic analysis of the file's content. For example, a semantic carver might conclude that six blocks of french in the middle of a long HTML file written in English is a fragment left from a previous allocated file, and not from the English-language HTML file.
 +
 
 +
;Carving with Validation
 +
:A method for carving files out of raw data where the carved files are validated using a file type specific validator.
 +
 
 +
;Fragment Recovery Carving
 +
:A carving method in which two or more fragments are reassembled to form the original file or object. Garfinkel previously called this approach "Split Carving."
 +
 
 +
;Repackaging Carving
 +
:A carving method that modifies the extracted data by adding new headers, footers, or other information so that it can be viewed with standard utilities. For example, Garfinkel's [[ZIP Carver]] looks for individual components of a ZIP file and repackages them with a new Central Directory so that they can be opened with a standard unzip utility.
 +
 
 +
== File Carving challenges and test images ==
 +
 
 +
[http://www.dfrws.org/2006/challenge/ File Carving Challenge] - [[Digital Forensic Research Workshop|DFRWS]] 2006
 +
 
 +
[http://www.dfrws.org/2007/challenge/ File Carving Challenge] - [[Digital Forensic Research Workshop|DFRWS]] 2007
 +
 
 +
[http://dftt.sourceforge.net/test6/index.html FAT Undelete Test #1] - Digital Forensics Tool Testing Image (dftt #6)
 +
 
 +
[http://dftt.sourceforge.net/test7/index.html NTFS Undelete (and leap year) Test #1] - Digital Forensics Tool Testing Image (dftt #7)
 +
 
 +
[http://dftt.sourceforge.net/test11/index.html Basic Data Carving Test - fat32], Nick Mikus - Digital Forensics Tool Testing Image (dftt #11)
 +
 
 +
[http://dftt.sourceforge.net/test12/index.html Basic Data Carving Test - ext2],  Nick Mikus - Digital Forensics Tool Testing Image (dftt #12)
 +
 
 +
== See also ==
 +
* [[Tools:Data_Recovery#Carving | File Carving Tools]]
 +
* [[File Carving Bibliography]]
 +
* [[Carver 2.0 Planning Page]]
 +
* [[File Carving:SmartCarving|SmartCarving]]
 +
 
 +
=Memory Carving=
 +
 
 +
== External Links ==
 +
* [http://sourceforge.net/projects/revit/files/Documentation/Master%20Thesis%20-%20Advanced%20File%20Carving/ Measuring and Improving the Quality of File Carving Methods], by [[Bas Kloet]]

Revision as of 04:45, 31 July 2012

File Carving, or sometimes simply Carving, is the practice of searching an input for files or other kinds of objects based on content, rather than on metadata. File carving is a powerful tool for recovering files and fragments of files when directory entries are corrupt or missing, as may be the case with old files that have been deleted or when performing an analysis on damaged media. Memory carving is a useful tool for analyzing physical and virtual memory dumps when the memory structures are unknown or have been overwritten.


Most file carvers operate by looking for file headers and/or footers, and then "carving out" the blocks between these two boundaries. Semantic Carving performs carving based on an analysis of the contents of the proposed files.

File carving should be done on a disk image, rather than on the original disk.

File carving tools are listed on the Tools:Data_Recovery wiki page.

Many carving programs have an option to only look at or near sector boundaries where headers are found. However, searching the entire input can find files that have been embedded into other files, such as JPEGs being embedded into Microsoft Word documents. This may be considered an advantage or a disadvantage, depending on the circumstances.

The majority of file carving programs will only recover files that are contiguous on the media (in other words files that are not fragmented).

Fragmented File Recovery

Simson Garfinkel estimated that upto 58% of outlook, 17% of jpegs and 16% of MS-Word files are fragmented and, therefore, appear corrupted or missing to a user using traditional data carving. The first set of file carving programs that can handle fragmented files automatically have finally arrived. A. Pal, N. Memon. T. Sencar and K. Shanmugasundaram have introduced a technique called SmartCarving that can recover fragmented files.

File Carving Taxonomy

Simson Garfinkel and Joachim Metz have proposed the following file carving taxonomy:

Carving
General term for extracting data (files) out of undifferentiated blocks (raw data), like "carving" a sculpture out of soap stone.
Block-Based Carving
Any carving method (algorithm) that analyzes the input on block-by-block basis to determine if a block is part of a possible output file. This method assumes that each block can only be part of a single file (or embedded file).
Statistical Carving
Any carving method (algorithm) that analyzes the input on characteristic or statistic for example, entropy) to determine if the input is part of a possible output file.
Header/Footer Carving
A method for carving files out of raw data using a distinct header (start of file marker) and footer (end of file marker).
Header/Maximum (file) size Carving
A method for carving files out of raw data using a distinct header (start of file marker) and a maximum (file) size. This approach works because many file formats (e.g. JPEG, MP3) do not care if additional junk is appended to the end of a valid file.
Header/Embedded Length Carving
A method for carving files out of raw data using a distinct header and a file length (size) which is embedded in the file format
File structure based Carving
A method for carving files out of raw data using a certain level of knowledge of the internal structure of file types. Garfinkel called this approach "Semantic Carving" in his DFRWS2006 carving challenge submission, while Metz and Mora called the approach "Deep Carving."
Semantic Carving
A method for carving files based on a linguistic analysis of the file's content. For example, a semantic carver might conclude that six blocks of french in the middle of a long HTML file written in English is a fragment left from a previous allocated file, and not from the English-language HTML file.
Carving with Validation
A method for carving files out of raw data where the carved files are validated using a file type specific validator.
Fragment Recovery Carving
A carving method in which two or more fragments are reassembled to form the original file or object. Garfinkel previously called this approach "Split Carving."
Repackaging Carving
A carving method that modifies the extracted data by adding new headers, footers, or other information so that it can be viewed with standard utilities. For example, Garfinkel's ZIP Carver looks for individual components of a ZIP file and repackages them with a new Central Directory so that they can be opened with a standard unzip utility.

File Carving challenges and test images

File Carving Challenge - DFRWS 2006

File Carving Challenge - DFRWS 2007

FAT Undelete Test #1 - Digital Forensics Tool Testing Image (dftt #6)

NTFS Undelete (and leap year) Test #1 - Digital Forensics Tool Testing Image (dftt #7)

Basic Data Carving Test - fat32, Nick Mikus - Digital Forensics Tool Testing Image (dftt #11)

Basic Data Carving Test - ext2, Nick Mikus - Digital Forensics Tool Testing Image (dftt #12)

See also

Memory Carving

External Links