Difference between pages "File Carving Bibliography" and "Gzip"

From Forensics Wiki
(Difference between pages)
Jump to: navigation, search
m (Basic Techniques)
 
(See Also)
 
Line 1: Line 1:
==File Carving Bibliography==
+
{{expand}}
'''In chronological order, oldest to most recent'''
+
===Basic Techniques===
+
  
<bibtex>
+
== File format ==
@INPROCEEDINGS{Shanmugasundaram02automaticreassembly,
+
The gzip file (.gz) format consists of:
    author = {Kulesh Shanmugasundaram},
+
* a file header
    title = {Automatic Reassembly of Document Fragments via Data Compression},
+
* optional headers
    booktitle = {Presented at the 2nd Digital Forensics Research Workshop},
+
** extra fields
    year = {2002},
+
** original file name
    pages = {152--159}
+
** comment
}
+
** header checksum
</bibtex>
+
* compressed data (commonly used compression method DEFLATE, without zlib header)
 +
* a file footer
  
[http://handle.dtic.mil/100.2/ADA432468 An analysis of disc carving techniques], Mikus, Nicholas A. " Master's Thesis, Naval Postgraduate School. March 2005.
+
{| class="wikitable"
 +
! align="left"| Characteristics
 +
! Description
 +
|-
 +
| Byte order
 +
| little-endian
 +
|-
 +
| Date and time values
 +
| Filetime in UTC
 +
|-
 +
| Character string
 +
| ISO 8859-1 (LATIN-1)
 +
|}
  
Garfinkel, S., "Carving Contiguous and Fragmented Files with Fast Object Validation", Digital Forensics Workshop (DFRWS 2007), Pittsburgh, PA, August 2007. http://www.simson.net/clips/academic/2007.DFRWS.pdf
+
=== File header ===
 +
The file header is 10 bytes in size and contains:
 +
{| class="wikitable"
 +
! align="left"| Offset
 +
! Size
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| 2
 +
| 0x1f 0x8b
 +
| Signature (or identification byte 1 and 2)
 +
|-
 +
| 2
 +
| 1
 +
|
 +
| Compression Method
 +
|-
 +
| 3
 +
| 1
 +
|
 +
| Flags
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Last modification time <br> Contains a POSIX timestamp.
 +
|-
 +
| 8
 +
| 1
 +
|
 +
| Compression flags (or extra flags)
 +
|-
 +
| 9
 +
| 1
 +
|
 +
| Operating system <br> Value that indicates on which operating system the gzip file was created.
 +
|}
  
===Sector Discrimination===
+
==== Compression method ====
  
<bibtex>
+
{| class="wikitable"
@article{
+
! align="left"| Value
  journal="Journal of Digital Forensic Practice"
+
! Identifier
  publisher="Taylor & Francis",
+
! Description
  author="Yoginder Singh Dandass and Nathan Joseph Necaise and Sherry Reede Thomas",
+
|-
  title="An Empirical Analysis of Disk Sector Hashes for Data Carving",
+
| 0 - 7
  year=2008,
+
|
  volume=2,
+
| Reserved
  issue=2,
+
|-
  pages="95--106",
+
| 8
  abstract="Discovering known illicit material on digital storage devices is an important component of a digital forensic investigation. Using existing data carving techniques and tools, it is typically difficult to recover remaining fragments of deleted illicit files whose file system metadata and file headers have been overwritten by newer files. In such cases, a sector-based scan can be used to locate those sectors whose content matches those of sectors from known illicit files. However, brute-force sector-by-sector comparison is prohibitive in terms of time required. Techniques that compute and compare hash-based signatures of sectors in order to filter out those sectors that do not produce the same signatures as sectors from known illicit files are required for accelerating the process.
+
| deflate
 +
| deflate compressed data
 +
|}
  
This article reports the results of a case study in which the hashes for over 528 million sectors extracted from over 433,000 files of different types were analyzed. The hashes were computed using SHA1, MD5, CRC64, and CRC32 algorithms and hash collisions of sectors from JPEG and WAV files to other sectors were recorded. The analysis of the results shows that although MD5 and SHA1 produce no false-positive indications, the occurrence of false positives is relatively low for CRC32 and especially CRC64. Furthermore, the CRC-based algorithms produce considerably smaller hashes than SHA1 and MD5, thereby requiring smaller storage capacities. CRC64 provides a good compromise between number of collisions and storage capacity required for practical implementations of sector-scanning forensic tools.",
+
==== Flags ====
  url="http://www.informaworld.com/10.1080/15567280802050436"
+
}
+
</bibtex>
+
  
[[Category:Bibliographies]]
+
{| class="wikitable"
 +
! align="left"| Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0x01
 +
| FTEXT
 +
| If set the uncompressed data needs to be treated as text instead of binary data. <br> This flag hints end-of-line conversion for cross-platform text files but does not enforce it.
 +
|-
 +
| 0x02
 +
| FHCRC
 +
| The file contains a header checksum (CRC-16)
 +
|-
 +
| 0x04
 +
| FEXTRA
 +
| The file contains extra fields
 +
|-
 +
| 0x08
 +
| FNAME
 +
| The file contains an original file name string
 +
|-
 +
| 0x10
 +
| FCOMMENT
 +
| The file contains comment
 +
|-
 +
| 0x20
 +
|
 +
| Reserved
 +
|-
 +
| 0x40
 +
|
 +
| Reserved
 +
|-
 +
| 0x80
 +
|
 +
| Reserved
 +
|}
 +
 
 +
<b>Notes:</b>
 +
* Reserved flags bits must be zero.
 +
* The FHCRC bit was never set by versions of gzip up to 1.2.4, even though it was documented with a different meaning in gzip 1.2.4.
 +
 
 +
==== Compression flags ====
 +
This value contains flags specific to the compression method.
 +
 
 +
===== Compression flags - deflate =====
 +
If compression method value is 8 (deflate) the following compression flags can be used:
 +
{| class="wikitable"
 +
! align="left"| Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0x02
 +
|
 +
| compressor used maximum compression, slowest algorithm
 +
|-
 +
| 0x04
 +
|
 +
| compressor used fastest algorithm
 +
|}
 +
 
 +
==== Operating System ====
 +
{| class="wikitable"
 +
! align="left"| Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0
 +
|
 +
| FAT filesystem (MS-DOS, OS/2, NT/Win32)
 +
|-
 +
| 1
 +
|
 +
| Amiga
 +
|-
 +
| 2
 +
|
 +
| VMS (or OpenVMS)
 +
|-
 +
| 3
 +
|
 +
| Unix
 +
|-
 +
| 4
 +
|
 +
| VM/CMS
 +
|-
 +
| 5
 +
|
 +
| Atari TOS
 +
|-
 +
| 6
 +
|
 +
| HPFS filesystem (OS/2, NT)
 +
|-
 +
| 7
 +
|
 +
| Macintosh
 +
|-
 +
| 8
 +
|
 +
| Z-System
 +
|-
 +
| 9
 +
|
 +
| CP/M
 +
|-
 +
| 10
 +
|
 +
| TOPS-20
 +
|-
 +
| 11
 +
|
 +
| NTFS filesystem (NT)
 +
|-
 +
| 12
 +
|
 +
| QDOS
 +
|-
 +
| 13
 +
|
 +
| Acorn RISCOS
 +
|-
 +
| 255
 +
|
 +
| unknown
 +
|}
 +
 
 +
=== Optional headers ===
 +
==== Extra fields ====
 +
This value is present in the file if the FEXTRA flag is set in the file header flags.
 +
 
 +
The extra field are variable of size and contains:
 +
{| class="wikitable"
 +
! align="left"| Offset
 +
! Size
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| 2
 +
|
 +
| Extra field data size <br> Value in bytes.
 +
|-
 +
| 2
 +
| ...
 +
|
 +
| Extra field data
 +
|}
 +
 
 +
==== Original file name ====
 +
This value is present in the file if the FNAME flag is set in the file header flags.
 +
 
 +
This is the original name of the file being compressed, with any directory components removed, and, if the file being compressed is on a file system with case insensitive names, forced to lower case.
 +
 
 +
Contains an ISO 8859-1 (LATIN-1) string with end-of-string character.
 +
 
 +
==== Comment ====
 +
This value is present in the file if the FCOMMENT flag is set in the file header flags.
 +
 
 +
Contains an ISO 8859-1 (LATIN-1) string with end-of-string character. Line breaks should be denoted by a single line feed character.
 +
 
 +
==== Header checksum ====
 +
The header checksum contain a CRC-16 that consists of the two least significant bytes of the CRC-32 for all bytes of the gzip header up to and not including the CRC-16.
 +
 
 +
=== File footer ===
 +
The file footer is 8 bytes in size and contains:
 +
{| class="wikitable"
 +
! align="left"| Offset
 +
! Size
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| 4
 +
|
 +
| Checksum (CRC-32)
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Uncompressed data size <br> Value in bytes.
 +
|}
 +
 
 +
== See Also ==
 +
* [[bzip2]]
 +
* [[tar]]
 +
 
 +
== External Links ==
 +
 
 +
* [http://www.gzip.org/format.txt The gzip file format], by the [http://www.gzip.org/ gzip project]
 +
* [http://www.gzip.org/algorithm.txt The gzip compression algorithm], by the [http://www.gzip.org/ gzip project]
 +
* [http://tools.ietf.org/html/rfc1952 RFC1952: GZIP file format specification version 4.3], by [[IETF]]
 +
* [http://en.wikipedia.org/wiki/Gzip Wikipedia: gzip]
 +
 
 +
[[Category:File Formats]]

Revision as of 08:01, 30 November 2013

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Contents

File format

The gzip file (.gz) format consists of:

  • a file header
  • optional headers
    • extra fields
    • original file name
    • comment
    • header checksum
  • compressed data (commonly used compression method DEFLATE, without zlib header)
  • a file footer
Characteristics Description
Byte order little-endian
Date and time values Filetime in UTC
Character string ISO 8859-1 (LATIN-1)

File header

The file header is 10 bytes in size and contains:

Offset Size Value Description
0 2 0x1f 0x8b Signature (or identification byte 1 and 2)
2 1 Compression Method
3 1 Flags
4 4 Last modification time
Contains a POSIX timestamp.
8 1 Compression flags (or extra flags)
9 1 Operating system
Value that indicates on which operating system the gzip file was created.

Compression method

Value Identifier Description
0 - 7 Reserved
8 deflate deflate compressed data

Flags

Value Identifier Description
0x01 FTEXT If set the uncompressed data needs to be treated as text instead of binary data.
This flag hints end-of-line conversion for cross-platform text files but does not enforce it.
0x02 FHCRC The file contains a header checksum (CRC-16)
0x04 FEXTRA The file contains extra fields
0x08 FNAME The file contains an original file name string
0x10 FCOMMENT The file contains comment
0x20 Reserved
0x40 Reserved
0x80 Reserved

Notes:

  • Reserved flags bits must be zero.
  • The FHCRC bit was never set by versions of gzip up to 1.2.4, even though it was documented with a different meaning in gzip 1.2.4.

Compression flags

This value contains flags specific to the compression method.

Compression flags - deflate

If compression method value is 8 (deflate) the following compression flags can be used:

Value Identifier Description
0x02 compressor used maximum compression, slowest algorithm
0x04 compressor used fastest algorithm

Operating System

Value Identifier Description
0 FAT filesystem (MS-DOS, OS/2, NT/Win32)
1 Amiga
2 VMS (or OpenVMS)
3 Unix
4 VM/CMS
5 Atari TOS
6 HPFS filesystem (OS/2, NT)
7 Macintosh
8 Z-System
9 CP/M
10 TOPS-20
11 NTFS filesystem (NT)
12 QDOS
13 Acorn RISCOS
255 unknown

Optional headers

Extra fields

This value is present in the file if the FEXTRA flag is set in the file header flags.

The extra field are variable of size and contains:

Offset Size Value Description
0 2 Extra field data size
Value in bytes.
2 ... Extra field data

Original file name

This value is present in the file if the FNAME flag is set in the file header flags.

This is the original name of the file being compressed, with any directory components removed, and, if the file being compressed is on a file system with case insensitive names, forced to lower case.

Contains an ISO 8859-1 (LATIN-1) string with end-of-string character.

Comment

This value is present in the file if the FCOMMENT flag is set in the file header flags.

Contains an ISO 8859-1 (LATIN-1) string with end-of-string character. Line breaks should be denoted by a single line feed character.

Header checksum

The header checksum contain a CRC-16 that consists of the two least significant bytes of the CRC-32 for all bytes of the gzip header up to and not including the CRC-16.

File footer

The file footer is 8 bytes in size and contains:

Offset Size Value Description
0 4 Checksum (CRC-32)
4 4 Uncompressed data size
Value in bytes.

See Also

External Links