Difference between pages "Bzip2" and "Gzip"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
 
 
Line 1: Line 1:
 
{{expand}}
 
{{expand}}
  
The bzip2 (.bz2) file consists of a single bzip2 stream. The bzip2 stream consists of:
+
== File format ==
* The stream header.
+
The gzip file (.gz) format consists of:
 +
* a file header
 +
* optional extra headers, such as the original file name,
 +
* a body, containing a DEFLATE-compressed payload
 +
* an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data.
  
The stream header is 4 bytes in size and contains:
+
=== File header ===
 +
The file header is 10 bytes in size and contains:
 
{| class="wikitable"
 
{| class="wikitable"
 
! align="left"| Offset
 
! align="left"| Offset
Line 13: Line 18:
 
| 0
 
| 0
 
| 2
 
| 2
| "BZ"
+
| 0x1f 0x8b
| Signature (magic number)
+
| Signature (or identification byte 1 and 2)
 
|-
 
|-
 
| 2
 
| 2
 
| 1
 
| 1
 
|
 
|
| Version <br> 'h' for Bzip2 ('H'uffman coding), '0' for Bzip1 (deprecated)
+
| Compression Method
 
|-
 
|-
 
| 3
 
| 3
 
| 1
 
| 1
 
|
 
|
| Block size <br> Value is defined in increments of 100 kB <br> '1'..'9' block-size 100 kB-900 kB (uncompressed) <br> <b>Note: currently assumed that kB should be kiB</b>
+
| Flags
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Last modification time <br> Contains a POSIX timestamp.
 +
|-
 +
| 8
 +
| 1
 +
|
 +
| Extra flags
 +
|-
 +
| 9
 +
| 1
 +
|
 +
| Operating system <br> Value that indicates on which operating system the gzip file was created.
 
|}
 
|}
  
* followed by zero or more compressed blocks
+
==== Extra flags ====
<pre>
+
If compression method is 8 the following extra flags can be defined:
.compressed_magic:48            = 0x314159265359 (BCD (pi))
+
* 0x02 - compressor used maximum compression, slowest algorithm
.crc:32                        = checksum for this block
+
* 0x04 - compressor used fastest algorithm
.randomised:1                  = 0=>normal, 1=>randomised (deprecated)
+
.origPtr:24                    = starting pointer into BWT for after untransform
+
.huffman_used_map:16            = bitmap, of ranges of 16 bytes, present/not present
+
.huffman_used_bitmaps:0..256    = bitmap, of symbols used, present/not present (multiples of 16)
+
.huffman_groups:3              = 2..6 number of different Huffman tables in use
+
.selectors_used:15              = number of times that the Huffman tables are swapped (each 50 bytes)
+
*.selector_list:1..6            = zero-terminated bit runs (0..62) of MTF'ed Huffman table (*selectors_used)
+
.start_huffman_length:5        = 0..20 starting bit length for Huffman deltas
+
*.delta_bit_length:1..40        = 0=>next symbol; 1=>alter length
+
                                                { 1=>decrement length;  0=>increment length } (*(symbols+2)*groups)
+
.contents:2..∞                  = Huffman encoded data stream until end of block
+
</pre>
+
 
+
* immediately followed by an end-of-stream marker containing a 32-bit CRC for the uncompressed data.
+
<pre>
+
.eos_magic:48                  = 0x177245385090 (BCD sqrt(pi))
+
.crc:32                        = checksum for whole stream
+
.padding:0..7                  = align to whole byte
+
</pre>
+
 
+
The compressed blocks are bit-aligned and no padding occurs.
+
  
 
== External Links ==
 
== External Links ==
  
* [http://en.wikipedia.org/wiki/Bzip2 Wikipedia: bzip2]
+
* [http://www.gzip.org/format.txt The gzip file format], by the [http://www.gzip.org/ gzip project]
 +
* [http://www.gzip.org/algorithm.txt The gzip compression algorithm], by the [http://www.gzip.org/ gzip project]
 +
* [http://tools.ietf.org/html/rfc1952 RFC1952: GZIP file format specification version 4.3], by [[IETF]]
 +
* [http://en.wikipedia.org/wiki/Gzip Wikipedia: gzip]
  
 
[[Category:File Formats]]
 
[[Category:File Formats]]

Revision as of 01:31, 28 November 2013

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

File format

The gzip file (.gz) format consists of:

  • a file header
  • optional extra headers, such as the original file name,
  • a body, containing a DEFLATE-compressed payload
  • an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data.

File header

The file header is 10 bytes in size and contains:

Offset Size Value Description
0 2 0x1f 0x8b Signature (or identification byte 1 and 2)
2 1 Compression Method
3 1 Flags
4 4 Last modification time
Contains a POSIX timestamp.
8 1 Extra flags
9 1 Operating system
Value that indicates on which operating system the gzip file was created.

Extra flags

If compression method is 8 the following extra flags can be defined:

  • 0x02 - compressor used maximum compression, slowest algorithm
  • 0x04 - compressor used fastest algorithm

External Links