ForensicsWiki will continue to operate as it has before and will not be shutting down. Thank you for your continued support of ForensicsWiki.

Difference between pages "Gzip" and "Bzip2"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
 
 
Line 1: Line 1:
 
{{expand}}
 
{{expand}}
  
The gzip file (.gz) format consists of:
+
The bzip2 (.bz2) file consists of a single bzip2 stream. The bzip2 stream consists of:
* The file header:
+
* The stream header.
The file header is 10 bytes in size and contains:
+
 
 +
The stream header is 4 bytes in size and contains:
 
{| class="wikitable"
 
{| class="wikitable"
 
! align="left"| Offset
 
! align="left"| Offset
Line 12: Line 13:
 
| 0
 
| 0
 
| 2
 
| 2
| 0x1f 0x8b
+
| "BZ"
| Signature (or identification byte 1 and 2)
+
| Signature (magic number)
 
|-
 
|-
 
| 2
 
| 2
 
| 1
 
| 1
 
|
 
|
| Compression Method
+
| Version <br> 'h' for Bzip2 ('H'uffman coding), '0' for Bzip1 (deprecated)
 
|-
 
|-
 
| 3
 
| 3
 
| 1
 
| 1
 
|
 
|
| Flags
+
| Block size <br> Value is defined in increments of 100 kB <br> '1'..'9' block-size 100 kB-900 kB (uncompressed) <br> <b>Note: currently assumed that kB should be kiB</b>
|-
+
| 4
+
| 4
+
|
+
| Last modification time <br> Contains a POSIX timestamp.
+
|-
+
| 8
+
| 1
+
|
+
| Extra flags
+
|-
+
| 9
+
| 1
+
|
+
| Operating system <br> Value that indicates on which operating system the gzip file was created.
+
 
|}
 
|}
  
* optional extra headers, such as the original file name,
+
* followed by zero or more compressed blocks
* a body, containing a DEFLATE-compressed payload
+
<pre>
* an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data.
+
.compressed_magic:48            = 0x314159265359 (BCD (pi))
 +
.crc:32                        = checksum for this block
 +
.randomised:1                  = 0=>normal, 1=>randomised (deprecated)
 +
.origPtr:24                    = starting pointer into BWT for after untransform
 +
.huffman_used_map:16            = bitmap, of ranges of 16 bytes, present/not present
 +
.huffman_used_bitmaps:0..256    = bitmap, of symbols used, present/not present (multiples of 16)
 +
.huffman_groups:3              = 2..6 number of different Huffman tables in use
 +
.selectors_used:15              = number of times that the Huffman tables are swapped (each 50 bytes)
 +
*.selector_list:1..6            = zero-terminated bit runs (0..62) of MTF'ed Huffman table (*selectors_used)
 +
.start_huffman_length:5        = 0..20 starting bit length for Huffman deltas
 +
*.delta_bit_length:1..40        = 0=>next symbol; 1=>alter length
 +
                                                { 1=>decrement length;  0=>increment length } (*(symbols+2)*groups)
 +
.contents:2..∞                  = Huffman encoded data stream until end of block
 +
</pre>
 +
 
 +
* immediately followed by an end-of-stream marker containing a 32-bit CRC for the uncompressed data.
 +
<pre>
 +
.eos_magic:48                  = 0x177245385090 (BCD sqrt(pi))
 +
.crc:32                        = checksum for whole stream
 +
.padding:0..7                  = align to whole byte
 +
</pre>
 +
 
 +
The compressed blocks are bit-aligned and no padding occurs.
  
 
== External Links ==
 
== External Links ==
  
* [http://www.gzip.org/format.txt The gzip file format], by the [http://www.gzip.org/ gzip project]
+
* [http://en.wikipedia.org/wiki/Bzip2 Wikipedia: bzip2]
* [http://www.gzip.org/algorithm.txt The gzip compression algorithm], by the [http://www.gzip.org/ gzip project]
+
* [http://tools.ietf.org/html/rfc1952 RFC1952: GZIP file format specification version 4.3], by [[IETF]]
+
* [http://en.wikipedia.org/wiki/Gzip Wikipedia: gzip]
+
  
 
[[Category:File Formats]]
 
[[Category:File Formats]]

Revision as of 05:09, 28 November 2013

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

The bzip2 (.bz2) file consists of a single bzip2 stream. The bzip2 stream consists of:

  • The stream header.

The stream header is 4 bytes in size and contains:

Offset Size Value Description
0 2 "BZ" Signature (magic number)
2 1 Version
'h' for Bzip2 ('H'uffman coding), '0' for Bzip1 (deprecated)
3 1 Block size
Value is defined in increments of 100 kB
'1'..'9' block-size 100 kB-900 kB (uncompressed)
Note: currently assumed that kB should be kiB
  • followed by zero or more compressed blocks
.compressed_magic:48            = 0x314159265359 (BCD (pi))
.crc:32                         = checksum for this block
.randomised:1                   = 0=>normal, 1=>randomised (deprecated)
.origPtr:24                     = starting pointer into BWT for after untransform
.huffman_used_map:16            = bitmap, of ranges of 16 bytes, present/not present
.huffman_used_bitmaps:0..256    = bitmap, of symbols used, present/not present (multiples of 16)
.huffman_groups:3               = 2..6 number of different Huffman tables in use
.selectors_used:15              = number of times that the Huffman tables are swapped (each 50 bytes)
*.selector_list:1..6            = zero-terminated bit runs (0..62) of MTF'ed Huffman table (*selectors_used)
.start_huffman_length:5         = 0..20 starting bit length for Huffman deltas
*.delta_bit_length:1..40        = 0=>next symbol; 1=>alter length
                                                { 1=>decrement length;  0=>increment length } (*(symbols+2)*groups)
.contents:2..∞                  = Huffman encoded data stream until end of block
  • immediately followed by an end-of-stream marker containing a 32-bit CRC for the uncompressed data.
.eos_magic:48                   = 0x177245385090 (BCD sqrt(pi))
.crc:32                         = checksum for whole stream
.padding:0..7                   = align to whole byte

The compressed blocks are bit-aligned and no padding occurs.

External Links