Difference between pages "Carver 2.0 Planning Page" and "Excel Spreadsheet (XLSB)"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m (License: dropping my Q to discussion page)
 
(MIME types)
 
Line 1: Line 1:
This page is for planning Carver 2.0.
+
The '''Excel Spreadsheet (XLSB) file format''' has the '''.xlsb''' extension. This file type originates from [[Microsoft Excel]]. However, other spreadsheet software can be used to display these files as well. These include:
 +
* [[OpenOffice]] (probably as of version 3)
  
Please, do not delete text (ideas) here. Use something like this:
+
== MIME types ==
  
<pre>
+
== File signature ==
<s>bad idea</s>
+
:: good idea
+
</pre>
+
  
This will look like:
+
[[Microsoft Excel]] spreadsheets of version 2007 are stored in a [[ZIP archive]] file. These files therefore have the ZIP file signature
  
<s>bad idea</s>
+
== See Also==
:: good idea
+
  
= License =
+
[[Excel Spreadsheet (XLS)]]
  
BSD-3.
+
[[Excel Spreadsheet (XLSX)]]
  
= OS =
+
[http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/Excel2007BinaryFileFormat(xlsb)Specification.pdf Excel 2007 Binary File format by Microsoft]
  
Linux/FreeBSD/MacOS (shouldn't this just match what the underlying afflib & sleuthkit cover? [[User:RB|RB]])
+
[[Category:File Formats]]
 
+
= Requirements =
+
* AFF and EWF file images supported from scratch.
+
* File system aware layer.
+
** By default, files are not carved. (clarify: only identified? [[User:RB|RB]]; I guess that it operates like [[Selective file dumper]] [[User:.FUF|.FUF]] 07:00, 29 October 2008 (UTC))
+
* Plug-in architecture for identification/validation.
+
** Can handle config files,like Revit07, to enter different file formats used by the carver.
+
* Ship with validators for:
+
** JPEG
+
** PNG
+
** GIF
+
** MSOLE
+
** ZIP
+
** TAR (gz/bz2)
+
* Simple fragment recovery carving using gap carving.
+
* Recovering of individual ZIP sections and JPEG icons that are not sector aligned.
+
* Autonomous operation (some mode of operation should be completely non-interactive, requiring no human intervention to complete [[User:RB|RB]])
+
* Tested on 500GB-sized images. Should be able to carve a 500GB image in roughly 50% longer than it takes to read the image.
+
** Perhaps allocate a percentage budget per-validator (i.e. each validator adds N% to the carving time)
+
* Parallelizable.
+
* Configuration:
+
** Capability to parse some existing carvers' configuration files, either on-the-fly or as a one-way converter.
+
** Disengage internal configuration structure from configuration files, create parsers that present the expected structure
+
**  Either extend Scalpel/Foremost syntaxes for extended features or use a tertiary syntax
+
* Can output audit.txt file.
+
* Easy integration into ascription software.
+
 
+
= Ideas =
+
* Use as much TSK if possible. Don't carry your own FS implementation the way photorec does.
+
* Extracting/carving data from [[Thumbs.db]]? I've used [[foremost]] for it with some success. [[Vinetto]] has some critical bugs :( [[User:.FUF|.FUF]] 19:18, 28 October 2008 (UTC)
+
* Carving data structures. For example, extract all TCP headers from image by defining TCP header structure and some fields (e.g. source port > 1024, dest port = 80). This will extract all data matching the pattern and write a file with other fields. Another example is carving INFO2 structures and URL activity records from index.dat [[User:.FUF|.FUF]] 20:51, 28 October 2008 (UTC)
+
** This has the opportunity to be extended to the concept of "point at blob FOO and interpret it as BAR"
+
 
+
.FUF added:
+
The main idea is to allow users to define structures, for example (in pascal-like form):
+
 
+
<pre>
+
Field1: Byte = 123;
+
SomeTextLength: DWORD;
+
SomeText: string[SomeTextLength];
+
Field4: Char = 'r';
+
...
+
</pre>
+
 
+
This will produce something like this:
+
<pre>
+
Field1 = 123
+
SomeTextLength = 5
+
SomeText = 'abcd1'
+
Field4 = 'r'
+
</pre>
+
 
+
(In text or raw forms.)
+
 
+
Opinions?
+
 
+
Opinion: Simple pattern identification like that may not suffice, I think Simson's original intent was not only to identify but to allow for validation routines (plugins, as the original wording was).  As such, the format syntax would need to implement a large chunk of some programming language in order to be sufficiently flexible. [[User:RB|RB]]
+
 
+
= Supported File Systems =
+
 
+
Build a large list of supported filesystems. File carving programs ignore the filesystem, but this doesn't mean that they support all of them. Do we support Reiser4 with tail packing? Or exFAT? Or NTFS with compression? Document this. [[User:.FUF|.FUF]] 19:18, 28 October 2008 (UTC)
+
 
+
:: As noted above, TSK should be utilized as much as possible, particularly the filesystem-aware portion.  If we want to identify filesystems outside of its supported set, it would be more worth our time to work on implementing them there than in the carver itself.  [[User:RB|RB]]
+
 
+
:::: I guess this tool operates like [[Selective file dumper]] and can recover files in both ways (or not?). Recovering files by using carving can recover files in situations where sleuthkit does nothing (e.g. file on NTFS was deleted using ntfs-3g, or filesystem was destroyed or just unknown). And we should build the list of filesystems supported by carver, not by TSK. [[User:.FUF|.FUF]] 07:08, 29 October 2008 (UTC)
+
 
+
:: This tool is still in the early planning stages (requirements discovery), hence few operational details (like precise modes of operation) have been fleshed out - those will and should come later.  The justification for strictly using TSK for the filesystem-sensitive approach is simple: TSK has good filesystem APIs, and it would be foolish to create yet another standalone, incompatible implementation of filesystem(foo) when time would be better spent improving those in TSK, aiding other methods of analysis as well.  This is the same reason individuals that have implemented several other carvers are participating: de-duplication of effort.  [[User:RB|RB]]
+
 
+
=Implementation Timeline=
+
# gather the available resources/ideas/wishes/needs etc. (I guess we're in this phase)
+
# start discussing a high level design (in terms of algorithm, facilities, information needed)
+
## input formats facility
+
## partition/volume facility
+
## file system facility
+
## file format facility
+
## content facility
+
## how to deal with fragment detection (do the validators allow for fragment detection?)
+
## how to deal with recombination of fragments
+
## do we want multiple carving phases in light of speed/precision tradeoffs
+
# start detailing parts of the design
+
## Discuss options for a grammar driven validator?
+
## Hard-coded plug-ins?
+
## Which exsisting code can we use?
+
# start building/assembling parts of the tooling for a prototype
+
## Implement simple file carving with validation.
+
## Implement gap carving
+
# Initial Release
+
# Implement the ''threaded carving'' that [[User:.FUF|.FUF]] is describing above.
+

Latest revision as of 05:53, 31 January 2009

The Excel Spreadsheet (XLSB) file format has the .xlsb extension. This file type originates from Microsoft Excel. However, other spreadsheet software can be used to display these files as well. These include:

MIME types

File signature

Microsoft Excel spreadsheets of version 2007 are stored in a ZIP archive file. These files therefore have the ZIP file signature

See Also

Excel Spreadsheet (XLS)

Excel Spreadsheet (XLSX)

Excel 2007 Binary File format by Microsoft