Difference between pages "Bzip2" and "Gzip"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
 
 
Line 1: Line 1:
 
{{expand}}
 
{{expand}}
  
The bzip2 (.bz2) file consists of a single bzip2 stream. The bzip2 stream consists of:
+
The gzip file (.gz) format consists of:
* The stream header.
+
* a file header
 +
* optional extra headers, such as the original file name,
 +
* a body, containing a DEFLATE-compressed payload
 +
* an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data.
  
The stream header is 4 bytes in size and contains:
+
=== File header ===
 +
The file header is 10 bytes in size and contains:
 
{| class="wikitable"
 
{| class="wikitable"
 
! align="left"| Offset
 
! align="left"| Offset
Line 13: Line 17:
 
| 0
 
| 0
 
| 2
 
| 2
| "BZ"
+
| 0x1f 0x8b
| Signature (magic number)
+
| Signature (or identification byte 1 and 2)
 
|-
 
|-
 
| 2
 
| 2
 
| 1
 
| 1
 
|
 
|
| Version <br> 'h' for Bzip2 ('H'uffman coding), '0' for Bzip1 (deprecated)
+
| Compression Method
 
|-
 
|-
 
| 3
 
| 3
 
| 1
 
| 1
 
|
 
|
| Block size <br> Value is defined in increments of 100 kB <br> '1'..'9' block-size 100 kB-900 kB (uncompressed) <br> <b>Note: currently assumed that kB should be kiB</b>
+
| Flags
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Last modification time <br> Contains a POSIX timestamp.
 +
|-
 +
| 8
 +
| 1
 +
|
 +
| Extra flags
 +
|-
 +
| 9
 +
| 1
 +
|
 +
| Operating system <br> Value that indicates on which operating system the gzip file was created.
 
|}
 
|}
  
* followed by zero or more compressed blocks
+
==== Extra flags ====
<pre>
+
If compression method is 8 the following extra flags can be defined:
.compressed_magic:48            = 0x314159265359 (BCD (pi))
+
* 0x02 - compressor used maximum compression, slowest algorithm
.crc:32                        = checksum for this block
+
* 0x04 - compressor used fastest algorithm
.randomised:1                  = 0=>normal, 1=>randomised (deprecated)
+
.origPtr:24                    = starting pointer into BWT for after untransform
+
.huffman_used_map:16            = bitmap, of ranges of 16 bytes, present/not present
+
.huffman_used_bitmaps:0..256    = bitmap, of symbols used, present/not present (multiples of 16)
+
.huffman_groups:3              = 2..6 number of different Huffman tables in use
+
.selectors_used:15              = number of times that the Huffman tables are swapped (each 50 bytes)
+
*.selector_list:1..6            = zero-terminated bit runs (0..62) of MTF'ed Huffman table (*selectors_used)
+
.start_huffman_length:5        = 0..20 starting bit length for Huffman deltas
+
*.delta_bit_length:1..40        = 0=>next symbol; 1=>alter length
+
                                                { 1=>decrement length;  0=>increment length } (*(symbols+2)*groups)
+
.contents:2..∞                  = Huffman encoded data stream until end of block
+
</pre>
+
 
+
* immediately followed by an end-of-stream marker containing a 32-bit CRC for the uncompressed data.
+
<pre>
+
.eos_magic:48                  = 0x177245385090 (BCD sqrt(pi))
+
.crc:32                        = checksum for whole stream
+
.padding:0..7                  = align to whole byte
+
</pre>
+
 
+
The compressed blocks are bit-aligned and no padding occurs.
+
  
 
== External Links ==
 
== External Links ==
  
* [http://en.wikipedia.org/wiki/Bzip2 Wikipedia: bzip2]
+
* [http://www.gzip.org/format.txt The gzip file format], by the [http://www.gzip.org/ gzip project]
 +
* [http://www.gzip.org/algorithm.txt The gzip compression algorithm], by the [http://www.gzip.org/ gzip project]
 +
* [http://tools.ietf.org/html/rfc1952 RFC1952: GZIP file format specification version 4.3], by [[IETF]]
 +
* [http://en.wikipedia.org/wiki/Gzip Wikipedia: gzip]
  
 
[[Category:File Formats]]
 
[[Category:File Formats]]

Revision as of 02:31, 28 November 2013

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

The gzip file (.gz) format consists of:

  • a file header
  • optional extra headers, such as the original file name,
  • a body, containing a DEFLATE-compressed payload
  • an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data.

File header

The file header is 10 bytes in size and contains:

Offset Size Value Description
0 2 0x1f 0x8b Signature (or identification byte 1 and 2)
2 1 Compression Method
3 1 Flags
4 4 Last modification time
Contains a POSIX timestamp.
8 1 Extra flags
9 1 Operating system
Value that indicates on which operating system the gzip file was created.

Extra flags

If compression method is 8 the following extra flags can be defined:

  • 0x02 - compressor used maximum compression, slowest algorithm
  • 0x04 - compressor used fastest algorithm

External Links