Difference between pages "SIM Cards" and "Carver 2.0 Planning Page"

From Forensics Wiki
(Difference between pages)
Jump to: navigation, search
 
(Validator Construction)
 
Line 1: Line 1:
[[Image:Simpic.jpg|thumb|A typical SIM card.]]
+
This page is for planning Carver 2.0.
  
== SIM-Subscriber Identity Module ==
+
Please, do not delete text (ideas) here. Use something like this:
  
The UICC (Universal Integrated Circuit Card) is a smart card which contains account information and memory that is used to enable GSM cellular telephones.  One of the applications running on the smart card is the SIM, or Subscriber Identity Module. In common parlance the term "UICC" is not used an the phrase "SIM" is used to describe the smart card itself.
+
<pre>
 +
<s>bad idea</s>
 +
:: good idea
 +
</pre>
  
Because the SIM is just one of several applications running on the smart card, a given card could, in theory, contain multiple SIMs. This would allow multiple phone numbers or accounts to be accessed by a single UICC. This is seldom seen, though there is at least one "12-in-1" SIM card being advertised at present.
+
This will look like:
  
Early versions of the UICC used full-size smart cards (85mm x 54mm x 0.76mm).  The card has since been shrunk to the standard size of 25mm x 15mm x 0.76mm.
+
<s>bad idea</s>
 +
:: good idea
  
 +
= License =
  
Although UICC cards traditionally held just 16 to 64KB of memory, the recent trend has been to produce SIM cards with larger storage capacities, ranging from 512MB up to [http://www.m-systems.com/site/en-US/ M-Systems'] 1GB SIM Card slated for release in late 2006.
+
BSD-3.
  
== ICCID ==
+
= OS =
  
Each SIM is internationally identified by its ICC-ID (Integrated Circuit Card ID). ICC-IDs are stored in the SIM card and can also be engraved or printed on the SIM card’s body during a process called personalization. The number is up to 18 digits long with an addition of a single “check digit” that is used for error detection.  This single digit allows us to detect an input error of digits, mistyped digits or a permutation of two successive digits.  This digit was calculated using the Luhn algorithm.
+
Linux/FreeBSD/MacOS
A typical SIM (19 digits) example 89 91 10 1200 00 320451 0, provide several details as follows:
+
: (shouldn't this just match what the underlying afflib & sleuthkit cover? [[User:RB|RB]])
• The first two digits (89 in the example) refers to the Telecom Id.
+
:: Yes, but you need to test and validate on each. Question: Do we want to support windows? [[User:Simsong|Simsong]] 21:09, 30 October 2008 (UTC)
• The next two digits (91 in the example) refers to the country code (91-India).
+
:: [[User:Joachim Metz|Joachim]] I think we would do wise to design with windows support from the start this will improve the platform independence from the start
• The next two digits (10 in the example) refers to the network code.
+
• The next four digits (1200 in the example) refers to the month and year of manufacturing.
+
• The next two digits (00 in the example) refers to the switch configuration code.
+
• The next six digits (320451 in the example) refers to the SIM number.
+
• The last digit which is separated from the rest is called the “check digit”.
+
  
These digits can be further grouped for additional information:
+
= Requirements =
• The first 3 to 4 digits represents the Mobile Country Code (MCC) 
+
(Some cards only have 3 digits to represent the Telecom ID and country code.)
+
• The next 2 digits represent the Mobile Network Code (MNC, AKA the mobile operator)
+
• The next 12 digits is the number represent the Home Location Register
+
• And mentioned above, the “check digit”
+
  
== Location Area Identity==
+
* [[User:Joachim Metz|Joachim]] A name for the tooling I propose coldcut
  
Operation networks for cell phone devices are divided into area locations called Location Areas.  Each location is identified with its own unique identification number creating the LAI (Location Area Identity).  A phone will store this number on its SIM card so it knows what location it’s in and to be able to receive service.  If a phone were to change to a new Location Area, it stores the new LAI in the SIM card, adding to a list of all the previous LAIs it has been in.  This way if a phone is powered down, when it boots back up, it can search its list of LAIs it has stored until it finds the one its in and can start to receive service again.  This is much quicker than scanning the whole list of frequencies that a telephone can have access on. 
+
[[User:Joachim Metz|Joachim]] Could we do a MoSCoW evaluation of these.
This is a real plus for forensic investigators because when a SIM card is reviewed, they can get a general idea of where the SIM card has been geographically.  In turn this tells them where the phone has been and can then relate back to where the individual who owns the phone has been.  
+
  
== SIM Security ==
+
* AFF and EWF file images supported from scratch. ([[User:Joachim Metz|Joachim]] I would like to have raw/split raw and device access as well)
 +
* [[User:Joachim Metz|Joachim]] volume/partition aware layer (what about carving unpartioned space)
 +
* File system aware layer.
 +
** By default, files are not carved. (clarify: only identified? [[User:RB|RB]]; I guess that it operates like [[Selective file dumper]] [[User:.FUF|.FUF]] 07:00, 29 October 2008 (UTC))
 +
* Plug-in architecture for identification/validation.
 +
** [[User:Joachim Metz|Joachim]] support for multiple types of validators
 +
*** dedicated validator
 +
*** validator based on file library (i.e. we could specify/implement a file structure for these)
 +
*** configuration based validator (Can handle config files,like Revit07, to enter different file formats used by the carver.)
 +
* Ship with validators for:
 +
[[User:Joachim Metz|Joachim]] I think we should distinguish between file format validators and content validators
 +
** JPEG
 +
** PNG
 +
** GIF
 +
** MSOLE
 +
** ZIP
 +
** TAR (gz/bz2)
  
Information inside the UICC can be protected with a PIN and a PUK.
+
[[User:Joachim Metz|Joachim]] For a production carver we need at least the following formats
 +
** Grapical Images
 +
*** JPEG (the 3 different types with JFIF/EXIF support)
 +
*** PNG
 +
*** GIF
 +
*** BMP
 +
*** TIFF
 +
** Office documents
 +
*** OLE2 (Word/Excell content support)
 +
*** PDF
 +
*** Open Office/Office 2007 (ZIP+XML)
 +
** Archive files
 +
*** ZIP
 +
** E-mail files
 +
*** PFF (PST/OST)
 +
*** MBOX (text based format, base64 content support)
 +
** Audio/Video files
 +
*** MPEG
 +
*** MP2/MP3
 +
*** AVI
 +
*** ASF/WMV
 +
*** QuickTime
 +
** Printer spool files
 +
*** EMF (if I remember correctly)
 +
** Internet history files
 +
*** index.dat
 +
*** firefox (sqllite 3)
 +
** Other files
 +
*** thumbs.db
  
A PIN locks the SIM card until correct code is entered. Each phone network sets the PIN of SIM to a standard default number (this can be changed via handset). If PIN protection is enabled, the PIN will need to be entered each time phone is switched on. If the PIN is entered incorrectly 3 times in a row, the SIM card will be blocked requiring a PUK from the network/service provider.
+
* Simple fragment recovery carving using gap carving.
 +
** [[User:Joachim Metz|Joachim]] have hook in for more advanced fragment recovery?
 +
* Recovering of individual ZIP sections and JPEG icons that are not sector aligned.
 +
** [[User:Joachim Metz|Joachim]] I would propose a generic fragment detection and recovery
 +
* Autonomous operation (some mode of operation should be completely non-interactive, requiring no human intervention to complete [[User:RB|RB]])
 +
** [[User:Joachim Metz|Joachim]] as much as possible, but allow to be overwritten by user
 +
* Tested on 500GB-sized images. Should be able to carve a 500GB image in roughly 50% longer than it takes to read the image.
 +
** Perhaps allocate a percentage budget per-validator (i.e. each validator adds N% to the carving time)
 +
** [[User:Joachim Metz|Joachim]] have multiple carving phases for precision/speed trade off?
 +
* Parallelizable
 +
** [[User:Joachim Metz|Joachim]] tunable for different architectures
 +
* Configuration:
 +
** Capability to parse some existing carvers' configuration files, either on-the-fly or as a one-way converter.
 +
** Disengage internal configuration structure from configuration files, create parsers that present the expected structure
 +
** [[User:Joachim Metz|Joachim]] The validator should deal with the file structure the carving algorithm should not know anything about the file structure (as in revit07 design)
 +
**  Either extend Scalpel/Foremost syntaxes for extended features or use a tertiary syntax ([[User:Joachim Metz|Joachim]] I would prefer a derivative of the revit07 configuration syntax which already has encountered some problems of dealing with defining file structure in a configuration file)
 +
* Can output audit.txt file.
 +
* [[User:Joachim Metz|Joachim]] Can output database with offset analysis values i.e. for visualization tooling
 +
* [[User:Joachim Metz|Joachim]] Can output debug log for debugging the algorithm/validation
 +
* Easy integration into ascription software.
 +
** [[User:Joachim Metz|Joachim]] I'm no native speaker what do you mean with "ascription software"?
  
A PUK is needed if the PIN is entered incorrectly 3 times and the SIM is blocked (phone is unable to make and receive calls/texts). The PUK can be received from the network provider, or possibly the GSM cell phone manual. '''Caution:''' if PUK is entered 10 times incorrectly, the SIM card is permanently disabled and must be exchanged.
+
= Ideas =
 +
* Use as much TSK if possible. Don't carry your own FS implementation the way photorec does.
 +
** [[User:Joachim Metz|Joachim]] using TSK as much as possible would not allow to add your own file system support (i.e. mobile phones, memory structures, cap files)
 +
I would propose wrapping TSK and using it as much as possible but allow to integrate own FS implementations.
 +
* Extracting/carving data from [[Thumbs.db]]? I've used [[foremost]] for it with some success. [[Vinetto]] has some critical bugs :( [[User:.FUF|.FUF]] 19:18, 28 October 2008 (UTC)
 +
* Carving data structures. For example, extract all TCP headers from image by defining TCP header structure and some fields (e.g. source port > 1024, dest port = 80). This will extract all data matching the pattern and write a file with other fields. Another example is carving INFO2 structures and URL activity records from index.dat [[User:.FUF|.FUF]] 20:51, 28 October 2008 (UTC)
 +
** This has the opportunity to be extended to the concept of "point at blob FOO and interpret it as BAR"
  
== SIM Forensics ==
+
.FUF added:
 +
The main idea is to allow users to define structures, for example (in pascal-like form):
  
The data that a SIM card can provide the forensics examiner can be invaluable to an investigation. Acquiring a SIM card allows a large amount of information that the suspect has dealt with over the phone to be investigated.
+
<pre>
 +
Field1: Byte = 123;
 +
SomeTextLength: DWORD;
 +
SomeText: string[SomeTextLength];
 +
Field4: Char = 'r';
 +
...
 +
</pre>
  
In general, some of this data can help an investigator determine:
+
This will produce something like this:
* Phone numbers of calls made/received
+
<pre>
* Contacts
+
Field1 = 123
* [[SMS]] details (time/date, recipient, etc.)
+
SomeTextLength = 5
* SMS text (the message itself)
+
SomeText = 'abcd1'
 +
Field4 = 'r'
 +
</pre>
  
There are many software solutions that can help the examiner to acquire the information from the SIM card. Several products include 3GForensics SIMIS [http://www.3gforensics.co.uk/products.htm], Inside Out's [http://simcon.no/ SIMCon], or SIM Content Controller, and Paraben Forensics' [http://www.paraben-forensics.com/catalog/product_info.php?products_id=289 SIM Card Seizure].
+
(In text or raw forms.)
  
The SIM file system is hierarchical in nature consisting of 3 parts:
+
Opinions?
*Master File (MF) - root of the file system that contains
+
DF’s and EF’s
+
*Dedicated File (DF)
+
*Elementary Files (EF)
+
  
 +
Opinion: Simple pattern identification like that may not suffice, I think Simson's original intent was not only to identify but to allow for validation routines (plugins, as the original wording was).  As such, the format syntax would need to implement a large chunk of some programming language in order to be sufficiently flexible. [[User:RB|RB]]
  
=== Data Acquisition ===
+
=File System Awareness =
 +
==Background: Why be File System Aware?==
 +
Advantages of being FS aware:
 +
* You can pick up sector allocation sizes ([[User:Joachim Metz|Joachim]] do you mean file system block sizes?)
 +
* Some file systems may store things off sector boundaries. (ReiserFS with tail packing)
 +
* Increasingly file systems have compression (NTFS compression)
 +
* Carve just the sectors that are not in allocated files.
  
These software titles can extract such technical data from the SIM card as:
+
==Tasks that would be required==
  
* '''International Mobile Subscriber Identity (IMSI)''': A unique identifying number that identifies the phone/subscription to the [[GSM]] network
+
==Discussion==
* '''Mobile Country Code (MCC)''': A three-digit code that represents the SIM card's country of origin
+
:: As noted above, TSK should be utilized as much as possible, particularly the filesystem-aware portion.  If we want to identify filesystems outside of its supported set, it would be more worth our time to work on implementing them there than in the carver itself.  [[User:RB|RB]]
* '''Mobile Network Code (MNC)''': A two-digit code that represents the SIM card's home network
+
* '''Mobile Subscriber Identification Number (MSIN)''': A unique ten-digit identifying number that identifies the specific subscriber to the GSM network
+
* '''Mobile Subscriber International ISDN Number (MSISDN)''': A number that identifies the phone number used by the headset
+
* '''Abbreviated Dialing Numbers (ADN)''': Telephone numbers stored in sims memory
+
* '''Last Dialed Numbers (LDN)'''
+
* '''Short Message Service (SMS)''': Text Messages
+
* '''Public Land Mobile Network (PLMN) selector'''
+
* '''Forbidden PLMNs'''
+
* '''Location Information (LOCI)'''
+
* '''General Packet Radio Service (GPRS) location'''
+
* '''Integrated Circuit Card Identifier (ICCID)'''
+
* '''Service Provider Name (SPN)'''
+
* '''Phase Identification'''
+
* '''SIM Service Table (SST)'''
+
* '''Language Preference (LP)'''
+
* '''Card Holder Verification (CHV1) and (CHV2)'''
+
* '''Broadcast Control Channels (BCCH)'''
+
* '''Ciphering Key (Kc)'''
+
* '''Ciphering Key Sequence Number'''
+
* '''Emergency Call Code'''
+
* '''Fixed Dialing Numbers (FDN)'''
+
* '''Forbidden PLMNs'''
+
* '''Local Area Identitity (LAI)'''
+
* '''Own Dialing Number'''
+
* '''Temporary Mobile Subscriber Identity (TMSI)'''
+
* '''Routing Area Identifier (RIA) netowrk code'''
+
* '''Service Dialing Numbers (SDNs)'''
+
* '''Service Provider Name'''
+
* '''Depersonalizatoin Keys'''
+
  
This information can be used to contact the service provider to obtain even more information than is stored on the SIM card.
+
[[User:Joachim Metz|Joachim]] I would like to have the carver (recovery tool) also do recovery using file allocation data or remainders of file allocation data.
  
== USIM-Universal Subscriber Identity Module ==
+
:::: I guess this tool operates like [[Selective file dumper]] and can recover files in both ways (or not?). Recovering files by using carving can recover files in situations where sleuthkit does nothing (e.g. file on NTFS was deleted using ntfs-3g, or filesystem was destroyed or just unknown). And we should build the list of filesystems supported by carver, not by TSK. [[User:.FUF|.FUF]] 07:08, 29 October 2008 (UTC)
  
A Universal Subscriber Identity Module is an application for UMTS mobile telephony running on a UICC smart card which is inserted in a 3G mobile phone. There is a common misconception to call the UICC card itself a USIM, but the USIM is merely a logical entity on the physical card.
+
:: This tool is still in the early planning stages (requirements discovery), hence few operational details (like precise modes of operation) have been fleshed out - those will and should come later. The justification for strictly using TSK for the filesystem-sensitive approach is simple: TSK has good filesystem APIs, and it would be foolish to create yet another standalone, incompatible implementation of filesystem(foo) when time would be better spent improving those in TSK, aiding other methods of analysis as well.  This is the same reason individuals that have implemented several other carvers are participating: de-duplication of effort. [[User:RB|RB]]
  
It stores user subscriber information, authentication information and provides storage space for text messages and phone book contacts. The phone book on a UICC has been greatly enhanced.
+
[[User:Joachim Metz|Joachim]]
 +
I would go as far to ask you all to look beyond the carver as a tool and look from the perspective of the carver as part of the forensic investigation process. In my eyes certain information needed/acquired by the carver could be also very useful investigative information i.e. what part of a hard disk contains empty sectors.
  
For authentication purposes, the USIM stores a long-term preshared secret key K, which is shared with the Authentication Center (AuC) in the network. The USIM also verifies a sequence number that must be within a range using a window mechanism to avoid replay attacks, and is in charge of generating the session keys CK and IK to be used in the confidentiality and integrity algorithms of the KASUMI block cipher in Universal Mobile Telecommunications System (UMTS).
+
[[User:Joachim Metz|Joachim]]
 +
I'm missing a part on the page about the carving challenges (scenarios)
 +
* normal file (file structure, loose text based structure (more a content structure?))
 +
* fragmented file (the file entirely exist)
 +
* a file fragment (the file does not entirely exist)
 +
* intertwined file
 +
* encapsulated file (MPEG/network capture)
 +
* embedded file (JPEG thumbnail)
  
In Mobile Financial Services, USIM seems to be a mandetory Security Element for user authentication, authorization and stored credentials. With the integration of NFC Handset and USIM, users will be able to make proximity payments where the NFS handset enables contactless payment and USIM enables independent security element.
+
=Validator Construction=
This is the evolution of the SIM for 3G devices. It can allow for multiple phone numbers to be assigned to the USIM, thus giving more than one phone number to a device.
+
Options:
 +
* Write validators in C/C++
 +
* Have a scripting language for writing them (python? Perl?) our own?
 +
** [[User:Joachim Metz|Joachim]] use easy to embed programming languages i.e. Phyton or Lua
 +
* Use existing programs (libjpeg?) as plug-in validators?
 +
** [[User:Joachim Metz|Joachim]] define a file structure api for this
  
== Service Provider Data ==
+
=Existing Code that we have=
  
Some additional information the service provider might store:
+
[[User:Joachim Metz|Joachim]]
 +
* DFRWS2006/2007 carving challenge results
 +
* photorec
 +
* revit06 and revit07
 +
* s3/scarve
  
* A customer database
+
=Implementation Timeline=
* [[Call Detail Record]]s (CDR)
+
# gather the available resources/ideas/wishes/needs etc. (I guess we're in this phase)
* [[Home Location Register]] (HLR)
+
# start discussing a high level design (in terms of algorithm, facilities, information needed)
 
+
## input formats facility
 
+
## partition/volume facility
== Service Providers that use SIM Cards in the United States ==
+
## file system facility
* T-Mobile
+
## file format facility
* Cingular/AT&T
+
## content facility
 
+
## how to deal with fragment detection (do the validators allow for fragment detection?)
== Sim Card Text Encoding ==
+
## how to deal with recombination of fragments
 
+
## do we want multiple carving phases in light of speed/precision tradeoffs
Originally the middle-European [[GSM]] network used only a 7-bit code derived from the basic [[ASCII]] code. However as GSM spread worldwide it was concluded that more characters, such as the major characters of all living languages, should be able to be represented on GSM phones. Thus, there was a movement towards a 16-bit code known as [[UCS-2]] which is now the standard in GSM text encoding. This change in encoding can make it more difficult to accurately obtain data form [[SIM cards]] of the older generation which use the 7-bit encoding. This encoding is used to compress the hexadecimal size of certain elements of the SIMs data, particularly in [[SMS]] and [[Abbreviated Dialing Numbers]].
+
# start detailing parts of the design
 
+
## Discuss options for a grammar driven validator?
== Authentication Key (Ki) ==
+
## Hard-coded plug-ins?
The authentication key or Ki is a 128 bit key used in the authentication and cipher key generation process. In a nutshell, the key is used to authenticate the SIM on the GSM network. Each SIM contains this key which is assigned to it by the operator during the personalization process. The SIM card is specially designed so the Ki can't be compromised using a smart-card interface. However, flaws in the GSM cryptography have been discovered that do allow the extraction of the Ki from the SIM card, and essentially SIM card duplication.
+
## Which exsisting code can we use?
 
+
# start building/assembling parts of the tooling for a prototype
== See also ==
+
## Implement simple file carving with validation.
 
+
## Implement gap carving
* [[SIM Card Forensics]]
+
# Initial Release
 
+
# Implement the ''threaded carving'' that [[User:.FUF|.FUF]] is describing above.
== References ==
+
 
+
* [http://www.simcon.no/ SIMCon]
+
* [http://www.sectorforensics.co.uk/sim-examination.shtml Sector Forensics]
+
* [http://www.utica.edu/academic/institutes/ecii/ijde/articles.cfm?action=issue&id=5  IJDE Spring 2003 Volume 2, Issue 1 ]: [http://www.utica.edu/academic/institutes/ecii/publications/articles/A0658858-BFF6-C537-7CF86A78D6DE746D.pdf Forensics and the GSM Mobile Telephone System] (PDF)
+
* http://en.wikipedia.org/wiki/Subscriber_Identity_Module
+

Revision as of 04:16, 31 October 2008

This page is for planning Carver 2.0.

Please, do not delete text (ideas) here. Use something like this:

<s>bad idea</s>
:: good idea

This will look like:

bad idea

good idea

Contents

License

BSD-3.

OS

Linux/FreeBSD/MacOS

(shouldn't this just match what the underlying afflib & sleuthkit cover? RB)
Yes, but you need to test and validate on each. Question: Do we want to support windows? Simsong 21:09, 30 October 2008 (UTC)
Joachim I think we would do wise to design with windows support from the start this will improve the platform independence from the start

Requirements

  • Joachim A name for the tooling I propose coldcut

Joachim Could we do a MoSCoW evaluation of these.

  • AFF and EWF file images supported from scratch. (Joachim I would like to have raw/split raw and device access as well)
  • Joachim volume/partition aware layer (what about carving unpartioned space)
  • File system aware layer.
    • By default, files are not carved. (clarify: only identified? RB; I guess that it operates like Selective file dumper .FUF 07:00, 29 October 2008 (UTC))
  • Plug-in architecture for identification/validation.
    • Joachim support for multiple types of validators
      • dedicated validator
      • validator based on file library (i.e. we could specify/implement a file structure for these)
      • configuration based validator (Can handle config files,like Revit07, to enter different file formats used by the carver.)
  • Ship with validators for:

Joachim I think we should distinguish between file format validators and content validators

    • JPEG
    • PNG
    • GIF
    • MSOLE
    • ZIP
    • TAR (gz/bz2)

Joachim For a production carver we need at least the following formats

    • Grapical Images
      • JPEG (the 3 different types with JFIF/EXIF support)
      • PNG
      • GIF
      • BMP
      • TIFF
    • Office documents
      • OLE2 (Word/Excell content support)
      • PDF
      • Open Office/Office 2007 (ZIP+XML)
    • Archive files
      • ZIP
    • E-mail files
      • PFF (PST/OST)
      • MBOX (text based format, base64 content support)
    • Audio/Video files
      • MPEG
      • MP2/MP3
      • AVI
      • ASF/WMV
      • QuickTime
    • Printer spool files
      • EMF (if I remember correctly)
    • Internet history files
      • index.dat
      • firefox (sqllite 3)
    • Other files
      • thumbs.db
  • Simple fragment recovery carving using gap carving.
    • Joachim have hook in for more advanced fragment recovery?
  • Recovering of individual ZIP sections and JPEG icons that are not sector aligned.
    • Joachim I would propose a generic fragment detection and recovery
  • Autonomous operation (some mode of operation should be completely non-interactive, requiring no human intervention to complete RB)
    • Joachim as much as possible, but allow to be overwritten by user
  • Tested on 500GB-sized images. Should be able to carve a 500GB image in roughly 50% longer than it takes to read the image.
    • Perhaps allocate a percentage budget per-validator (i.e. each validator adds N% to the carving time)
    • Joachim have multiple carving phases for precision/speed trade off?
  • Parallelizable
    • Joachim tunable for different architectures
  • Configuration:
    • Capability to parse some existing carvers' configuration files, either on-the-fly or as a one-way converter.
    • Disengage internal configuration structure from configuration files, create parsers that present the expected structure
    • Joachim The validator should deal with the file structure the carving algorithm should not know anything about the file structure (as in revit07 design)
    • Either extend Scalpel/Foremost syntaxes for extended features or use a tertiary syntax (Joachim I would prefer a derivative of the revit07 configuration syntax which already has encountered some problems of dealing with defining file structure in a configuration file)
  • Can output audit.txt file.
  • Joachim Can output database with offset analysis values i.e. for visualization tooling
  • Joachim Can output debug log for debugging the algorithm/validation
  • Easy integration into ascription software.
    • Joachim I'm no native speaker what do you mean with "ascription software"?

Ideas

  • Use as much TSK if possible. Don't carry your own FS implementation the way photorec does.
    • Joachim using TSK as much as possible would not allow to add your own file system support (i.e. mobile phones, memory structures, cap files)

I would propose wrapping TSK and using it as much as possible but allow to integrate own FS implementations.

  • Extracting/carving data from Thumbs.db? I've used foremost for it with some success. Vinetto has some critical bugs :( .FUF 19:18, 28 October 2008 (UTC)
  • Carving data structures. For example, extract all TCP headers from image by defining TCP header structure and some fields (e.g. source port > 1024, dest port = 80). This will extract all data matching the pattern and write a file with other fields. Another example is carving INFO2 structures and URL activity records from index.dat .FUF 20:51, 28 October 2008 (UTC)
    • This has the opportunity to be extended to the concept of "point at blob FOO and interpret it as BAR"

.FUF added: The main idea is to allow users to define structures, for example (in pascal-like form):

Field1: Byte = 123;
SomeTextLength: DWORD;
SomeText: string[SomeTextLength];
Field4: Char = 'r';
...

This will produce something like this:

Field1 = 123
SomeTextLength = 5
SomeText = 'abcd1'
Field4 = 'r'

(In text or raw forms.)

Opinions?

Opinion: Simple pattern identification like that may not suffice, I think Simson's original intent was not only to identify but to allow for validation routines (plugins, as the original wording was). As such, the format syntax would need to implement a large chunk of some programming language in order to be sufficiently flexible. RB

File System Awareness

Background: Why be File System Aware?

Advantages of being FS aware:

  • You can pick up sector allocation sizes (Joachim do you mean file system block sizes?)
  • Some file systems may store things off sector boundaries. (ReiserFS with tail packing)
  • Increasingly file systems have compression (NTFS compression)
  • Carve just the sectors that are not in allocated files.

Tasks that would be required

Discussion

As noted above, TSK should be utilized as much as possible, particularly the filesystem-aware portion. If we want to identify filesystems outside of its supported set, it would be more worth our time to work on implementing them there than in the carver itself. RB

Joachim I would like to have the carver (recovery tool) also do recovery using file allocation data or remainders of file allocation data.

I guess this tool operates like Selective file dumper and can recover files in both ways (or not?). Recovering files by using carving can recover files in situations where sleuthkit does nothing (e.g. file on NTFS was deleted using ntfs-3g, or filesystem was destroyed or just unknown). And we should build the list of filesystems supported by carver, not by TSK. .FUF 07:08, 29 October 2008 (UTC)
This tool is still in the early planning stages (requirements discovery), hence few operational details (like precise modes of operation) have been fleshed out - those will and should come later. The justification for strictly using TSK for the filesystem-sensitive approach is simple: TSK has good filesystem APIs, and it would be foolish to create yet another standalone, incompatible implementation of filesystem(foo) when time would be better spent improving those in TSK, aiding other methods of analysis as well. This is the same reason individuals that have implemented several other carvers are participating: de-duplication of effort. RB

Joachim I would go as far to ask you all to look beyond the carver as a tool and look from the perspective of the carver as part of the forensic investigation process. In my eyes certain information needed/acquired by the carver could be also very useful investigative information i.e. what part of a hard disk contains empty sectors.

Joachim I'm missing a part on the page about the carving challenges (scenarios)

  • normal file (file structure, loose text based structure (more a content structure?))
  • fragmented file (the file entirely exist)
  • a file fragment (the file does not entirely exist)
  • intertwined file
  • encapsulated file (MPEG/network capture)
  • embedded file (JPEG thumbnail)

Validator Construction

Options:

  • Write validators in C/C++
  • Have a scripting language for writing them (python? Perl?) our own?
    • Joachim use easy to embed programming languages i.e. Phyton or Lua
  • Use existing programs (libjpeg?) as plug-in validators?
    • Joachim define a file structure api for this

Existing Code that we have

Joachim

  • DFRWS2006/2007 carving challenge results
  • photorec
  • revit06 and revit07
  • s3/scarve

Implementation Timeline

  1. gather the available resources/ideas/wishes/needs etc. (I guess we're in this phase)
  2. start discussing a high level design (in terms of algorithm, facilities, information needed)
    1. input formats facility
    2. partition/volume facility
    3. file system facility
    4. file format facility
    5. content facility
    6. how to deal with fragment detection (do the validators allow for fragment detection?)
    7. how to deal with recombination of fragments
    8. do we want multiple carving phases in light of speed/precision tradeoffs
  3. start detailing parts of the design
    1. Discuss options for a grammar driven validator?
    2. Hard-coded plug-ins?
    3. Which exsisting code can we use?
  4. start building/assembling parts of the tooling for a prototype
    1. Implement simple file carving with validation.
    2. Implement gap carving
  5. Initial Release
  6. Implement the threaded carving that .FUF is describing above.