ForensicsWiki will continue to operate as it has before and will not be shutting down. Thank you for your continued support of ForensicsWiki.

Difference between pages "Windows Prefetch File Format" and "Chrome Disk Cache Format"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(Header)
 
(Data stream)
 
Line 1: Line 1:
 
{{expand}}
 
{{expand}}
  
A Windows Prefetch file consists of one file header and multiple file sections with different content. Not all content has an obvious forensic value.
+
== Cache files ==
 +
The cache is stored in multiple:
 +
{| class="wikitable"
 +
|-
 +
! Filename
 +
! Description
 +
|-
 +
| index
 +
| The index file
 +
|-
 +
| data_#
 +
| Data block files
 +
|-
 +
| f_######
 +
| (Separate) data stream file
 +
|}
  
As far as have been possible to ascertain, there is no public description of the format. The description below has been synthesised from examination
+
== Cache address ==
of multiple prefetch files.
+
The cache address is 4 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| <i>If file type is 0 (Separate file)</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 28 bits
 +
|
 +
| File number <br> The value represents the value of # in f_######
 +
|-
 +
| <i>Else</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 16 bits
 +
|
 +
| Block number
 +
|-
 +
| 2.0
 +
| 8 bits
 +
|
 +
| File number (or file selector) <br> The value represents the value of # in data_#
 +
|-
 +
| 3.0
 +
| 2 bits
 +
|
 +
| Block size <br> The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
 +
|-
 +
| 3.2
 +
| 2 bits
 +
|
 +
| Reserved
 +
|-
 +
| <i>Common</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 3.4
 +
| 3 bits
 +
|
 +
| File type
 +
|-
 +
| 3.7
 +
| 1 bit
 +
|
 +
| Initialized flag
 +
|}
  
== Characteristics ==
+
=== File types ===
Integer values are stored in little-endian.
+
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| (Separate) data stream file
 +
|-
 +
| 1
 +
| (Rankings) block data file (36 byte block data file)
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|-
 +
|
 +
|
 +
|-
 +
| 6
 +
| Unknown; seen on Mac OS  X 0x6f430074
 +
|}
  
Strings are stored as UTF-16 little-endian without a byte-order-mark (BOM).
+
==== Examples ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0x00000000
 +
| Not initialized
 +
|-
 +
| 0x8000002a
 +
| Data stream file: f_00002a
 +
|-
 +
| 0xa0010003
 +
| Block data file: data_1, block number 3, 1 block of size
 +
|}
  
Timestamps are stored as Windows Filetime in UTC.
+
== Index file format (index) ==
 +
Overview:
 +
* File header
 +
* least recently used (LRU) data (or eviction control data)
 +
* index table
  
== Header ==
+
=== File header ===
 +
The index file header (struct IndexHeader) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x03\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Stored data size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Last created file number <br> The value represents the value of # in f_######
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Dirty flag <br> Identifier for all entries being changed
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Usage statistics data cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Table size <br> Where 0 represents 0x10000 (is this the same as the file size?)
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Signals a previous crash
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Identifier of an ongoing test or experiment
 +
|-
 +
| 40
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 48
 +
| 52 x 8 = 208
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
This format has been observed on Windows XP, ...  will need to be modified for Vista/Win7 format
+
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
|2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
  
 +
=== Least recently used (LRU) data ===
 +
The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Field
+
! offset
! Offset
+
! size
! Length
+
! value
! Type
+
! description
! Notes
+
 
|-
 
|-
| H1
+
| 0
| 0x0000
+
| 2 x 4 = 8
 +
|
 +
| Padding
 +
|-
 +
| 8
 
| 4
 
| 4
| DWORD
+
|  
| Format version (see format version section below)
+
| Filled flag <br> Value to indicate if when the cache was filled
 +
|-
 +
| 12
 +
| 5 x 4 = 20
 +
|
 +
| Array of sizes
 +
|-
 +
| 32
 +
| 5 x 4 = 20
 +
|
 +
| Array of head cache addresses
 +
|-
 +
| 52
 +
| 5 x 4 = 20
 +
|
 +
| Array of tail cache addresses
 
|-
 
|-
| H2
+
| 72
| 0x0004
+
 
| 4
 
| 4
| DWORD
+
|  
| Signature 'SCCA' (or in hexadecimal representation 0x53 0x43 0x43 0x4)
+
| Transaction cache address <br> Value to indicate an in-flight operation
 
|-
 
|-
| H3
+
| 76
| 0x0008
+
 
| 4
 
| 4
| DWORD?
+
|  
| Unknown - Values observed: 0x0F - Windows XP, 0x11 - Windows 7, Windows 8.1
+
| Operation <br> The in-flight operation
 
|-
 
|-
| H4
+
| 80
| 0x000C
+
 
| 4
 
| 4
| DWORD
+
|  
| Prefetch file length.
+
| Operations list <br> The in-flight operations list
 
|-
 
|-
| H5
+
| 84
|0x0010
+
| 7 x 4 = 28
| 60
+
|  
| USTR
+
| Padding <br> Contains 0-byte values
| Name of executable as Unicode string, truncated after 29 characters, if necessary, and terminated by an end-of-string character (U+0000). As it appears in the prefetch file file name.
+
|}
 +
 
 +
=== Array indexes ===
 +
The array indexes correspond to the file types.
 +
{| class="wikitable"
 
|-
 
|-
| H6
+
! Value
|0x004C
+
! Description
|4
+
|DWORD
+
|The prefetch hash, as it appears in the prefetch file name.
+
 
|-
 
|-
| H7
+
| 0
|0x0050
+
| Separate file
|4
+
|?
+
| Unknown (flags)? Values observed: 0 for almost all prefetch files (XP); 1 for NTOSBOOT-B00DFAAD.pf (XP)
+
 
|-
 
|-
 +
| 1
 +
| (Rankings) block data file
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 
|}
 
|}
  
The following part of the header is version dependent. Below the structure for format version 17.
+
=== Index table ===
 +
The index table is an array of cache addresses.
 +
 
 +
== Data block file format (data_#) ==
 +
Overview:
 +
* File header
 +
* array of blocks
  
 +
=== File header ===
 +
The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Field
+
! offset
! Offset
+
! size
! Length
+
! value
! Type
+
! description
! Notes
+
 
|-
 
|-
| H8
+
| 0
| 0x0054
+
 
| 4
 
| 4
| DWORD
+
| "\xc3\xca\x04\xc1"
| Offset to section A
+
| Signature
 
|-
 
|-
| H9
 
| 0x0058
 
 
| 4
 
| 4
| DWORD
+
| 2
| ? Nr of entries in section A
+
|  
 +
| Minor version
 
|-
 
|-
| H10
+
| 6
| 0x005C
+
| 2
| 4
+
|  
| DWORD
+
| Major version
| Offset to section B
+
 
|-
 
|-
| H11
+
| 8
| 0x0060
+
| 2
 +
|
 +
| File number (or file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 10
 +
| 2
 +
|
 +
| Next file number (or next file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 12
 
| 4
 
| 4
| DWORD
+
|  
| Nr of entries in section B
+
| Block size (or cache entry) size
 
|-
 
|-
| H12
+
| 16
| 0x0064
+
 
| 4
 
| 4
| DWORD
+
|  
| Offset to section C
+
| Number of entries
 
|-
 
|-
| H13
+
| 20
| 0x0068
+
 
| 4
 
| 4
| DWORD
+
|  
| Length of section C
+
| Maximum number of entries
 
|-
 
|-
| H14
+
| 24
| 0x006C
+
| 4 x 4 = 16
 +
|
 +
| Array of empty entry counters <br> The counters of empty entries for each type
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of last used position (hints) <br> The last used position for each type
 +
|-
 +
| 56
 
| 4
 
| 4
| DWORD
+
|  
| Offset to section D
+
| Updating <br> Value to keep track of updates to the header
 
|-
 
|-
| H15
+
| 60
| 0x0070
+
| 5 x 4 = 20
 +
|
 +
| User
 +
|-
 +
| 80
 +
| 2028 x 4 = 8112
 +
|
 +
| Block allocation bitmap
 +
|}
 +
 
 +
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
 +
 
 +
=== Cache entry ===
 +
The cache entry (struct EntryStore) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 
| 4
 
| 4
| DWORD
+
|  
| ? Probably the number of entries in the D section header
+
| Hash <br> The hash of the key
 
|-
 
|-
| H16
 
| 0x0074
 
 
| 4
 
| 4
| DWORD
+
| 4
| Length of section D
+
|  
 +
| Next entry cache address <br> The next entry with the same hash or bucket
 
|-
 
|-
| H17
 
| 0x0078
 
 
| 8
 
| 8
| FTIME
+
| 4
| Latest execution time of executable (FILETIME)
+
|  
 +
| Rankings node cache address
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Reuse count <br> Value that indicates how often this entry was (re-)used
 
|-
 
|-
| H18
 
| 0x0080
 
 
| 16
 
| 16
| ?
+
| 4
| ? Possibly structured as 4 DWORD. Observed values: /0x00000000 0x00000000 0x00000000 0x00000000/, /0x47868c00 0x00000000 0x47860c00 0x00000000/
+
|  
 +
| Refetch count <br> Value that indicates how often this entry was (re-)fetched (or downloaded)
 
|-
 
|-
| H19
+
| 20
| 0x0090
+
 
| 4
 
| 4
| DWORD
+
|  
| Execution counter
+
| Cache entry state
 
|-
 
|-
| H20
+
| 24
| 0x0094
+
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 32
 
| 4
 
| 4
| DWORD?
+
|  
| ? Observed values: 1, 2, 3, 4, 5, 6 (XP)
+
| Key data size
 
|-
 
|-
 +
| 36
 +
| 4
 +
|
 +
| Long key data cache address <br> The value is 0 if no long key data is present
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream sizes
 +
|-
 +
| 56
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Cache entry flags
 +
|-
 +
| 76
 +
| 4 x 4 = 16
 +
|
 +
| Padding
 +
|-
 +
| 92
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 92 bytes of the cache entry is this used as a checksum?
 +
|-
 +
| 96
 +
| 160
 +
|
 +
| Key data <br> Contains an UTF-8 encoded string with an end-of-string character with the original URL
 
|}
 
|}
  
It's worth noting that the name of a carved prefetch file can be restored using the information in field H5 and H6, and its size can be determined by field H4.
+
==== Array indexes ====
 
+
The data stream array indexes correspond to:
=== Format version ===
+
 
+
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
 
! Value
 
! Value
! Windows version
+
! Description
 
|-
 
|-
| 0x11
+
| 0
| Windows XP, Windows 2003
+
| HTTP response headers
 
|-
 
|-
| 0x17
+
| 1
| Windows Vista, Windows 7
+
| Page contents (Payload)
 
|-
 
|-
| 0x1a
+
| 2
| Windows 8.1
+
|  
 
|-
 
|-
 +
| 3
 +
|
 
|}
 
|}
  
== Section A ==
+
==== Cache entry state ====
This section contains an array with 20 byte (version 17) or 32 byte (version 23 and 26) entry records.
+
{| class="wikitable"
 
+
|-
The actual format and usage of these entry records is currently not known.
+
! Value
 
+
! Identifier
== Section B ==
+
! Description
This section contains an array with 12 byte (version 17, 23 and 26) entry records.
+
|-
 
+
| 0
The actual format and usage of these entry records is currently not known.
+
| ENTRY_NORMAL
 
+
| Normal cache entry
== Section C ==
+
|-
This section contains an array of UTF-16 little-endian formatted strings with end-of-string characters (U+0000).
+
| 1
 
+
| ENTRY_EVICTED
At the end of the section there seems to be alignment padding that can contain remnant values.
+
| The cache entry was recently evicted
 
+
|-
== Section D - Volume information (block) ==
+
| 2
 
+
| ENTRY_DOOMED
Section D contains one or more subsections. The number is (most likely) determined by the DWORD at file offset 0x0070. Each subsection refers to directories on an identified volume.
+
| The cache entry was doomed
 
+
|}
In this section, all offsets are assumed to be counted from the start of the D section.
+
  
 +
==== Cache entry flags ====
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Field
+
! Value
! Offset
+
! Identifier
! Length
+
! Description
! Type
+
! Notes
+
 
|-
 
|-
| DH1
+
| 0x00000001
| +0x0000
+
| PARENT_ENTRY
| 4
+
| Parent cache entry
| DWORD
+
| Offset to volume string (Unicode, terminated by U+0000)
+
 
|-
 
|-
| DH2
+
| 0x00000002
| +0x0004
+
| CHILD_ENTRY
| 4
+
| Child cache entry
| DWORD
+
|}
| Length of volume string (nr of characters, including terminating U+0000)
+
 
 +
=== Rankings node ===
 +
The rankings node (struct RankingsNode) is 36 bytes in size and consists of:
 +
{| class="wikitable"
 
|-
 
|-
| DH3
+
! offset
| +0x0008
+
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 
| 8
 
| 8
| FTIME
+
|  
| (File time)
+
| Last used <br> Contains LRU information?
 
|-
 
|-
| DH4
+
| 8
| +0x0010
+
| 8
| 4
+
|  
| DWORD
+
| Last modified <br> Contains LRU information?
| Volume serial number of volume indicated by volume string
+
 
|-
 
|-
| DH5
+
| 16
| +0x0014
+
 
| 4
 
| 4
| DWORD
+
|  
| ? Offset to section DHS1
+
| Next rankings node cache address
 
|-
 
|-
| DH6
+
| 20
| +0x0018
+
 
| 4
 
| 4
| DWORD
+
|  
| ? Length of section DHS1 (in bytes)
+
| Previous rankings node cache address
 
|-
 
|-
| DH7
+
| 24
| +0x001C
+
 
| 4
 
| 4
| DWORD
+
|  
| ? Offset to section DHS2
+
| Cache entry cache address
 
|-
 
|-
| DH8
+
| 28
| +0x0020
+
 
| 4
 
| 4
| DWORD
+
|  
| ? Nr of strings in section DHS2
+
| Is dirty flag
 
|-
 
|-
| ?
+
| 32
| +0x0024
+
| 4
| ?
+
|  
| ?
+
| Self hash <br> The hash of the first 32 bytes of the rankings node is this used as a checksum?
| ? additional 28 bytes (includes one timestamp?)
+
 
|}
 
|}
  
If all the executables and libraries referenced in the C section are from one single disk volume, there will be only one section in the D section. If multiple volumes are referenced by section C, section D will contain multiple sections.  (A simple way to force this situation is to copy, say, NOTEPAD.EXE to a USB drive, and start it from that volume. The corresponding prefetch file will have one D header referring to, e.g. \DEVICE\HARDDISK1\DP(1)0-0+4 (the USB drive), and one to, e.g. \DEVICE\HARDDISKVOLUME1\ (where the .DLLs and other support files were found).
+
== Data stream ==
 +
The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a [[gzip|gzip file]].
 +
 
 +
Note that the last modification time of the gzip file is set to 0.
 +
 
 +
== Hash ==
 +
The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:
 +
<pre>
 +
uint32_t SuperFastHash(
 +
          const uint8_t *key,
 +
          size_t key_size )
 +
{
 +
    size_t key_index    = 0;
 +
    size_t remainder    = 0;
 +
    uint32_t hash_value = 0;
 +
    uint32_t temp_value = 0;
 +
 
 +
    if( ( key == NULL ) || ( key_siz 0 ) )
 +
    {
 +
      return( 0 );
 +
    }
 +
    remainder = key_size % 4;
 +
    key_size -= remainder;
 +
 
 +
    for( key_index = 0;
 +
        key_index < key_size;
 +
        key_index += 4 )
 +
    {
 +
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );
 +
 
 +
        temp_value = ( temp_value << 11 ) ^ hash_value;
 +
 
 +
        hash_value  = ( hash_value << 16 ) ^ temp_value;
 +
        hash_value += hash_value >> 11;
 +
    }
 +
 
 +
    switch( remainder )
 +
    {
 +
        case 3:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value<< 16;
 +
            hash_value ^= key[ key_index + 2 ] << 18;
 +
            hash_value += hash_value >> 11;
 +
            break;
 +
 
 +
        case 2:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value << 11;
 +
            hash_value += hash_value >> 17;
 +
            break;
 +
 
 +
        case 1:
 +
            hash_value += key[ key_index ];
 +
            hash_value ^= hash_value << 10;
 +
            hash_value += hash_value >> 1;
 +
            break;
 +
    }
 +
 
 +
    /* Force "avalanching" of final 127 bits */
 +
    hash_value ^= hash_value << 3;
 +
    hash_value += hash_value >> 5;
 +
    hash_value ^= hash_value << 4;
 +
    hash_value += hash_value >> 17;
 +
    hash_value ^= hash_value << 25;
 +
    hash_value += hash_value >> 6;
 +
 
 +
    return hash_value;
 +
}
 +
</pre>
  
 
== See Also ==
 
== See Also ==
* [[Prefetch]]
+
* [[Google Chrome]]
 +
* [[gzip]]
  
 
== External Links ==
 
== External Links ==
 +
* [http://www.chromium.org/developers/design-documents/network-stack/disk-cache Disk Cache], The Chromium Projects
 +
 +
[[Category:File Formats]]

Revision as of 09:17, 22 June 2014

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Cache files

The cache is stored in multiple:

Filename Description
index The index file
data_# Data block files
f_###### (Separate) data stream file

Cache address

The cache address is 4 bytes in size and consists of:

offset size value description
If file type is 0 (Separate file)
0.0 28 bits File number
The value represents the value of # in f_######
Else
0.0 16 bits Block number
2.0 8 bits File number (or file selector)
The value represents the value of # in data_#
3.0 2 bits Block size
The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
3.2 2 bits Reserved
Common
3.4 3 bits File type
3.7 1 bit Initialized flag

File types

Value Description
0 (Separate) data stream file
1 (Rankings) block data file (36 byte block data file)
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file
6 Unknown; seen on Mac OS X 0x6f430074

Examples

Value Description
0x00000000 Not initialized
0x8000002a Data stream file: f_00002a
0xa0010003 Block data file: data_1, block number 3, 1 block of size

Index file format (index)

Overview:

  • File header
  • least recently used (LRU) data (or eviction control data)
  • index table

File header

The index file header (struct IndexHeader) is 256 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x03\xc1" Signature
4 2 Minor version
6 2 Major version
8 4 Number of entries
12 4 Stored data size
16 4 Last created file number
The value represents the value of # in f_######
20 4 Dirty flag
Identifier for all entries being changed
24 4 Usage statistics data cache address
28 4 Table size
Where 0 represents 0x10000 (is this the same as the file size?)
32 4 Signals a previous crash
36 4 Identifier of an ongoing test or experiment
40 8 Creation time
48 52 x 8 = 208 Padding
Contains 0-byte values

Format versions

Value Description
2.0
2.1

Least recently used (LRU) data

The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:

offset size value description
0 2 x 4 = 8 Padding
8 4 Filled flag
Value to indicate if when the cache was filled
12 5 x 4 = 20 Array of sizes
32 5 x 4 = 20 Array of head cache addresses
52 5 x 4 = 20 Array of tail cache addresses
72 4 Transaction cache address
Value to indicate an in-flight operation
76 4 Operation
The in-flight operation
80 4 Operations list
The in-flight operations list
84 7 x 4 = 28 Padding
Contains 0-byte values

Array indexes

The array indexes correspond to the file types.

Value Description
0 Separate file
1 (Rankings) block data file
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file

Index table

The index table is an array of cache addresses.

Data block file format (data_#)

Overview:

  • File header
  • array of blocks

File header

The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x04\xc1" Signature
4 2 Minor version
6 2 Major version
8 2 File number (or file index)
The value represents the value of # in data_#
10 2 Next file number (or next file index)
The value represents the value of # in data_#
12 4 Block size (or cache entry) size
16 4 Number of entries
20 4 Maximum number of entries
24 4 x 4 = 16 Array of empty entry counters
The counters of empty entries for each type
40 4 x 4 = 16 Array of last used position (hints)
The last used position for each type
56 4 Updating
Value to keep track of updates to the header
60 5 x 4 = 20 User
80 2028 x 4 = 8112 Block allocation bitmap

Format versions

Value Description
2.0
2.1

Cache entry

The cache entry (struct EntryStore) is 256 bytes in size and consists of:

offset size value description
0 4 Hash
The hash of the key
4 4 Next entry cache address
The next entry with the same hash or bucket
8 4 Rankings node cache address
12 4 Reuse count
Value that indicates how often this entry was (re-)used
16 4 Refetch count
Value that indicates how often this entry was (re-)fetched (or downloaded)
20 4 Cache entry state
24 8 Creation time
32 4 Key data size
36 4 Long key data cache address
The value is 0 if no long key data is present
40 4 x 4 = 16 Array of data stream sizes
56 4 x 4 = 16 Array of data stream cache addresses
72 4 Cache entry flags
76 4 x 4 = 16 Padding
92 4 Self hash
The hash of the first 92 bytes of the cache entry is this used as a checksum?
96 160 Key data
Contains an UTF-8 encoded string with an end-of-string character with the original URL

Array indexes

The data stream array indexes correspond to:

Value Description
0 HTTP response headers
1 Page contents (Payload)
2
3

Cache entry state

Value Identifier Description
0 ENTRY_NORMAL Normal cache entry
1 ENTRY_EVICTED The cache entry was recently evicted
2 ENTRY_DOOMED The cache entry was doomed

Cache entry flags

Value Identifier Description
0x00000001 PARENT_ENTRY Parent cache entry
0x00000002 CHILD_ENTRY Child cache entry

Rankings node

The rankings node (struct RankingsNode) is 36 bytes in size and consists of:

offset size value description
0 8 Last used
Contains LRU information?
8 8 Last modified
Contains LRU information?
16 4 Next rankings node cache address
20 4 Previous rankings node cache address
24 4 Cache entry cache address
28 4 Is dirty flag
32 4 Self hash
The hash of the first 32 bytes of the rankings node is this used as a checksum?

Data stream

The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a gzip file.

Note that the last modification time of the gzip file is set to 0.

Hash

The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:

uint32_t SuperFastHash(
          const uint8_t *key,
          size_t key_size )
{ 
    size_t key_index    = 0;
    size_t remainder    = 0;
    uint32_t hash_value = 0;
    uint32_t temp_value = 0;

    if( ( key == NULL ) || ( key_siz 0 ) )
    {
       return( 0 );
    }
    remainder = key_size % 4;
    key_size -= remainder;

    for( key_index = 0;
         key_index < key_size;
         key_index += 4 )
    {
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );

        temp_value = ( temp_value << 11 ) ^ hash_value;

        hash_value  = ( hash_value << 16 ) ^ temp_value;
        hash_value += hash_value >> 11;
    }

    switch( remainder )
    {
        case 3:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value<< 16;
            hash_value ^= key[ key_index + 2 ] << 18;
            hash_value += hash_value >> 11;
            break;

        case 2:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value << 11;
            hash_value += hash_value >> 17;
            break;

        case 1:
            hash_value += key[ key_index ];
            hash_value ^= hash_value << 10;
            hash_value += hash_value >> 1;
            break;
    }

    /* Force "avalanching" of final 127 bits */
    hash_value ^= hash_value << 3;
    hash_value += hash_value >> 5;
    hash_value ^= hash_value << 4;
    hash_value += hash_value >> 17;
    hash_value ^= hash_value << 25;
    hash_value += hash_value >> 6;

    return hash_value;
}

See Also

External Links