Difference between pages "Windows Prefetch File Format" and "Chrome Disk Cache Format"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(Header)
 
(Data stream)
 
Line 1: Line 1:
 
{{expand}}
 
{{expand}}
  
A Windows Prefetch file consists of one file header and multiple file sections with different content. Not all content has an obvious forensic value.
+
== Cache files ==
 +
The cache is stored in multiple:
 +
{| class="wikitable"
 +
|-
 +
! Filename
 +
! Description
 +
|-
 +
| index
 +
| The index file
 +
|-
 +
| data_#
 +
| Data block files
 +
|-
 +
| f_######
 +
| (Separate) data stream file
 +
|}
  
As far as have been possible to ascertain, there is no public description of the format. The description below has been synthesised from examination
+
== Cache address ==
of multiple prefetch files.
+
The cache address is 4 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| <i>If file type is 0 (Separate file)</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 28 bits
 +
|
 +
| File number <br> The value represents the value of # in f_######
 +
|-
 +
| <i>Else</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 16 bits
 +
|
 +
| Block number
 +
|-
 +
| 2.0
 +
| 8 bits
 +
|
 +
| File number (or file selector) <br> The value represents the value of # in data_#
 +
|-
 +
| 3.0
 +
| 2 bits
 +
|
 +
| Block size <br> The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
 +
|-
 +
| 3.2
 +
| 2 bits
 +
|
 +
| Reserved
 +
|-
 +
| <i>Common</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 3.4
 +
| 3 bits
 +
|
 +
| File type
 +
|-
 +
| 3.7
 +
| 1 bit
 +
|
 +
| Initialized flag
 +
|}
  
== Characteristics ==
+
=== File types ===
Integer values are stored in little-endian.
+
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| (Separate) data stream file
 +
|-
 +
| 1
 +
| (Rankings) block data file (36 byte block data file)
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|-
 +
|
 +
|
 +
|-
 +
| 6
 +
| Unknown; seen on Mac OS  X 0x6f430074
 +
|}
  
Strings are stored as UTF-16 little-endian without a byte-order-mark (BOM).
+
==== Examples ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0x00000000
 +
| Not initialized
 +
|-
 +
| 0x8000002a
 +
| Data stream file: f_00002a
 +
|-
 +
| 0xa0010003
 +
| Block data file: data_1, block number 3, 1 block of size
 +
|}
  
Timestamps are stored as Windows Filetime in UTC.
+
== Index file format (index) ==
 +
Overview:
 +
* File header
 +
* least recently used (LRU) data (or eviction control data)
 +
* index table
  
== Header ==
+
=== File header ===
 +
The index file header (struct IndexHeader) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x03\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Stored data size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Last created file number <br> The value represents the value of # in f_######
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Dirty flag <br> Identifier for all entries being changed
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Usage statistics data cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Table size <br> Where 0 represents 0x10000 (is this the same as the file size?)
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Signals a previous crash
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Identifier of an ongoing test or experiment
 +
|-
 +
| 40
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 48
 +
| 52 x 8 = 208
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
This format has been observed on Windows XP, ...  will need to be modified for Vista/Win7 format
+
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
|2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
  
 +
=== Least recently used (LRU) data ===
 +
The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Field
+
! offset
! Offset
+
! size
! Length
+
! value
! Type
+
! description
! Notes
+
 
|-
 
|-
| H1
+
| 0
| 0x0000
+
| 2 x 4 = 8
 +
|
 +
| Padding
 +
|-
 +
| 8
 
| 4
 
| 4
| DWORD
+
|  
| Format version (see format version section below)
+
| Filled flag <br> Value to indicate if when the cache was filled
 +
|-
 +
| 12
 +
| 5 x 4 = 20
 +
|
 +
| Array of sizes
 +
|-
 +
| 32
 +
| 5 x 4 = 20
 +
|
 +
| Array of head cache addresses
 +
|-
 +
| 52
 +
| 5 x 4 = 20
 +
|
 +
| Array of tail cache addresses
 
|-
 
|-
| H2
+
| 72
| 0x0004
+
 
| 4
 
| 4
| DWORD
+
|  
| Signature 'SCCA' (or in hexadecimal representation 0x53 0x43 0x43 0x4)
+
| Transaction cache address <br> Value to indicate an in-flight operation
 
|-
 
|-
| H3
+
| 76
| 0x0008
+
 
| 4
 
| 4
| DWORD?
+
|  
| Unknown - Values observed: 0x0F - Windows XP, 0x11 - Windows 7, Windows 8.1
+
| Operation <br> The in-flight operation
 
|-
 
|-
| H4
+
| 80
| 0x000C
+
 
| 4
 
| 4
| DWORD
+
|  
| Prefetch file length.
+
| Operations list <br> The in-flight operations list
 
|-
 
|-
| H5
+
| 84
|0x0010
+
| 7 x 4 = 28
| 60
+
|  
| USTR
+
| Padding <br> Contains 0-byte values
| Name of executable as Unicode string, truncated after 29 characters, if necessary, and terminated by an end-of-string character (U+0000). As it appears in the prefetch file file name.
+
|}
 +
 
 +
=== Array indexes ===
 +
The array indexes correspond to the file types.
 +
{| class="wikitable"
 
|-
 
|-
| H6
+
! Value
|0x004C
+
! Description
|4
+
|DWORD
+
|The prefetch hash, as it appears in the prefetch file name.
+
 
|-
 
|-
| H7
+
| 0
|0x0050
+
| Separate file
|4
+
|?
+
| Unknown (flags)? Values observed: 0 for almost all prefetch files (XP); 1 for NTOSBOOT-B00DFAAD.pf (XP)
+
 
|-
 
|-
 +
| 1
 +
| (Rankings) block data file
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 
|}
 
|}
  
The following part of the header is version dependent. Below the structure for format version 17.
+
=== Index table ===
 +
The index table is an array of cache addresses.
 +
 
 +
== Data block file format (data_#) ==
 +
Overview:
 +
* File header
 +
* array of blocks
  
 +
=== File header ===
 +
The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Field
+
! offset
! Offset
+
! size
! Length
+
! value
! Type
+
! description
! Notes
+
 
|-
 
|-
| H8
+
| 0
| 0x0054
+
 
| 4
 
| 4
| DWORD
+
| "\xc3\xca\x04\xc1"
| Offset to section A
+
| Signature
 
|-
 
|-
| H9
 
| 0x0058
 
 
| 4
 
| 4
| DWORD
+
| 2
| ? Nr of entries in section A
+
|  
 +
| Minor version
 
|-
 
|-
| H10
+
| 6
| 0x005C
+
| 2
| 4
+
|  
| DWORD
+
| Major version
| Offset to section B
+
 
|-
 
|-
| H11
+
| 8
| 0x0060
+
| 2
 +
|
 +
| File number (or file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 10
 +
| 2
 +
|
 +
| Next file number (or next file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 12
 
| 4
 
| 4
| DWORD
+
|  
| Nr of entries in section B
+
| Block size (or cache entry) size
 
|-
 
|-
| H12
+
| 16
| 0x0064
+
 
| 4
 
| 4
| DWORD
+
|  
| Offset to section C
+
| Number of entries
 
|-
 
|-
| H13
+
| 20
| 0x0068
+
 
| 4
 
| 4
| DWORD
+
|  
| Length of section C
+
| Maximum number of entries
 
|-
 
|-
| H14
+
| 24
| 0x006C
+
| 4 x 4 = 16
 +
|
 +
| Array of empty entry counters <br> The counters of empty entries for each type
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of last used position (hints) <br> The last used position for each type
 +
|-
 +
| 56
 
| 4
 
| 4
| DWORD
+
|  
| Offset to section D
+
| Updating <br> Value to keep track of updates to the header
 
|-
 
|-
| H15
+
| 60
| 0x0070
+
| 5 x 4 = 20
 +
|
 +
| User
 +
|-
 +
| 80
 +
| 2028 x 4 = 8112
 +
|
 +
| Block allocation bitmap
 +
|}
 +
 
 +
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
 +
 
 +
=== Cache entry ===
 +
The cache entry (struct EntryStore) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 
| 4
 
| 4
| DWORD
+
|  
| ? Probably the number of entries in the D section header
+
| Hash <br> The hash of the key
 
|-
 
|-
| H16
 
| 0x0074
 
 
| 4
 
| 4
| DWORD
+
| 4
| Length of section D
+
|  
 +
| Next entry cache address <br> The next entry with the same hash or bucket
 
|-
 
|-
| H17
 
| 0x0078
 
 
| 8
 
| 8
| FTIME
+
| 4
| Latest execution time of executable (FILETIME)
+
|  
 +
| Rankings node cache address
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Reuse count <br> Value that indicates how often this entry was (re-)used
 
|-
 
|-
| H18
 
| 0x0080
 
 
| 16
 
| 16
| ?
+
| 4
| ? Possibly structured as 4 DWORD. Observed values: /0x00000000 0x00000000 0x00000000 0x00000000/, /0x47868c00 0x00000000 0x47860c00 0x00000000/
+
|  
 +
| Refetch count <br> Value that indicates how often this entry was (re-)fetched (or downloaded)
 
|-
 
|-
| H19
+
| 20
| 0x0090
+
 
| 4
 
| 4
| DWORD
+
|  
| Execution counter
+
| Cache entry state
 
|-
 
|-
| H20
+
| 24
| 0x0094
+
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 32
 
| 4
 
| 4
| DWORD?
+
|  
| ? Observed values: 1, 2, 3, 4, 5, 6 (XP)
+
| Key data size
 
|-
 
|-
 +
| 36
 +
| 4
 +
|
 +
| Long key data cache address <br> The value is 0 if no long key data is present
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream sizes
 +
|-
 +
| 56
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Cache entry flags
 +
|-
 +
| 76
 +
| 4 x 4 = 16
 +
|
 +
| Padding
 +
|-
 +
| 92
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 92 bytes of the cache entry is this used as a checksum?
 +
|-
 +
| 96
 +
| 160
 +
|
 +
| Key data <br> Contains an UTF-8 encoded string with an end-of-string character with the original URL
 
|}
 
|}
  
It's worth noting that the name of a carved prefetch file can be restored using the information in field H5 and H6, and its size can be determined by field H4.
+
==== Array indexes ====
 
+
The data stream array indexes correspond to:
=== Format version ===
+
 
+
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
 
! Value
 
! Value
! Windows version
+
! Description
 
|-
 
|-
| 0x11
+
| 0
| Windows XP, Windows 2003
+
| HTTP response headers
 
|-
 
|-
| 0x17
+
| 1
| Windows Vista, Windows 7
+
| Page contents (Payload)
 
|-
 
|-
| 0x1a
+
| 2
| Windows 8.1
+
|  
 
|-
 
|-
 +
| 3
 +
|
 
|}
 
|}
  
== Section A ==
+
==== Cache entry state ====
This section contains an array with 20 byte (version 17) or 32 byte (version 23 and 26) entry records.
+
{| class="wikitable"
 
+
|-
The actual format and usage of these entry records is currently not known.
+
! Value
 
+
! Identifier
== Section B ==
+
! Description
This section contains an array with 12 byte (version 17, 23 and 26) entry records.
+
|-
 
+
| 0
The actual format and usage of these entry records is currently not known.
+
| ENTRY_NORMAL
 
+
| Normal cache entry
== Section C ==
+
|-
This section contains an array of UTF-16 little-endian formatted strings with end-of-string characters (U+0000).
+
| 1
 
+
| ENTRY_EVICTED
At the end of the section there seems to be alignment padding that can contain remnant values.
+
| The cache entry was recently evicted
 
+
|-
== Section D - Volume information (block) ==
+
| 2
 
+
| ENTRY_DOOMED
Section D contains one or more subsections. The number is (most likely) determined by the DWORD at file offset 0x0070. Each subsection refers to directories on an identified volume.
+
| The cache entry was doomed
 
+
|}
In this section, all offsets are assumed to be counted from the start of the D section.
+
  
 +
==== Cache entry flags ====
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Field
+
! Value
! Offset
+
! Identifier
! Length
+
! Description
! Type
+
! Notes
+
 
|-
 
|-
| DH1
+
| 0x00000001
| +0x0000
+
| PARENT_ENTRY
| 4
+
| Parent cache entry
| DWORD
+
| Offset to volume string (Unicode, terminated by U+0000)
+
 
|-
 
|-
| DH2
+
| 0x00000002
| +0x0004
+
| CHILD_ENTRY
| 4
+
| Child cache entry
| DWORD
+
|}
| Length of volume string (nr of characters, including terminating U+0000)
+
 
 +
=== Rankings node ===
 +
The rankings node (struct RankingsNode) is 36 bytes in size and consists of:
 +
{| class="wikitable"
 
|-
 
|-
| DH3
+
! offset
| +0x0008
+
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 
| 8
 
| 8
| FTIME
+
|  
| (File time)
+
| Last used <br> Contains LRU information?
 
|-
 
|-
| DH4
+
| 8
| +0x0010
+
| 8
| 4
+
|  
| DWORD
+
| Last modified <br> Contains LRU information?
| Volume serial number of volume indicated by volume string
+
 
|-
 
|-
| DH5
+
| 16
| +0x0014
+
 
| 4
 
| 4
| DWORD
+
|  
| ? Offset to section DHS1
+
| Next rankings node cache address
 
|-
 
|-
| DH6
+
| 20
| +0x0018
+
 
| 4
 
| 4
| DWORD
+
|  
| ? Length of section DHS1 (in bytes)
+
| Previous rankings node cache address
 
|-
 
|-
| DH7
+
| 24
| +0x001C
+
 
| 4
 
| 4
| DWORD
+
|  
| ? Offset to section DHS2
+
| Cache entry cache address
 
|-
 
|-
| DH8
+
| 28
| +0x0020
+
 
| 4
 
| 4
| DWORD
+
|  
| ? Nr of strings in section DHS2
+
| Is dirty flag
 
|-
 
|-
| ?
+
| 32
| +0x0024
+
| 4
| ?
+
|  
| ?
+
| Self hash <br> The hash of the first 32 bytes of the rankings node is this used as a checksum?
| ? additional 28 bytes (includes one timestamp?)
+
 
|}
 
|}
  
If all the executables and libraries referenced in the C section are from one single disk volume, there will be only one section in the D section. If multiple volumes are referenced by section C, section D will contain multiple sections.  (A simple way to force this situation is to copy, say, NOTEPAD.EXE to a USB drive, and start it from that volume. The corresponding prefetch file will have one D header referring to, e.g. \DEVICE\HARDDISK1\DP(1)0-0+4 (the USB drive), and one to, e.g. \DEVICE\HARDDISKVOLUME1\ (where the .DLLs and other support files were found).
+
== Data stream ==
 +
The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a [[gzip|gzip file]].
 +
 
 +
Note that the last modification time of the gzip file is set to 0.
 +
 
 +
== Hash ==
 +
The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:
 +
<pre>
 +
uint32_t SuperFastHash(
 +
          const uint8_t *key,
 +
          size_t key_size )
 +
{
 +
    size_t key_index    = 0;
 +
    size_t remainder    = 0;
 +
    uint32_t hash_value = 0;
 +
    uint32_t temp_value = 0;
 +
 
 +
    if( ( key == NULL ) || ( key_siz 0 ) )
 +
    {
 +
      return( 0 );
 +
    }
 +
    remainder = key_size % 4;
 +
    key_size -= remainder;
 +
 
 +
    for( key_index = 0;
 +
        key_index < key_size;
 +
        key_index += 4 )
 +
    {
 +
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );
 +
 
 +
        temp_value = ( temp_value << 11 ) ^ hash_value;
 +
 
 +
        hash_value  = ( hash_value << 16 ) ^ temp_value;
 +
        hash_value += hash_value >> 11;
 +
    }
 +
 
 +
    switch( remainder )
 +
    {
 +
        case 3:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value<< 16;
 +
            hash_value ^= key[ key_index + 2 ] << 18;
 +
            hash_value += hash_value >> 11;
 +
            break;
 +
 
 +
        case 2:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value << 11;
 +
            hash_value += hash_value >> 17;
 +
            break;
 +
 
 +
        case 1:
 +
            hash_value += key[ key_index ];
 +
            hash_value ^= hash_value << 10;
 +
            hash_value += hash_value >> 1;
 +
            break;
 +
    }
 +
 
 +
    /* Force "avalanching" of final 127 bits */
 +
    hash_value ^= hash_value << 3;
 +
    hash_value += hash_value >> 5;
 +
    hash_value ^= hash_value << 4;
 +
    hash_value += hash_value >> 17;
 +
    hash_value ^= hash_value << 25;
 +
    hash_value += hash_value >> 6;
 +
 
 +
    return hash_value;
 +
}
 +
</pre>
  
 
== See Also ==
 
== See Also ==
* [[Prefetch]]
+
* [[Google Chrome]]
 +
* [[gzip]]
  
 
== External Links ==
 
== External Links ==
 +
* [http://www.chromium.org/developers/design-documents/network-stack/disk-cache Disk Cache], The Chromium Projects
 +
 +
[[Category:File Formats]]

Revision as of 05:17, 22 June 2014

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Cache files

The cache is stored in multiple:

Filename Description
index The index file
data_# Data block files
f_###### (Separate) data stream file

Cache address

The cache address is 4 bytes in size and consists of:

offset size value description
If file type is 0 (Separate file)
0.0 28 bits File number
The value represents the value of # in f_######
Else
0.0 16 bits Block number
2.0 8 bits File number (or file selector)
The value represents the value of # in data_#
3.0 2 bits Block size
The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
3.2 2 bits Reserved
Common
3.4 3 bits File type
3.7 1 bit Initialized flag

File types

Value Description
0 (Separate) data stream file
1 (Rankings) block data file (36 byte block data file)
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file
6 Unknown; seen on Mac OS X 0x6f430074

Examples

Value Description
0x00000000 Not initialized
0x8000002a Data stream file: f_00002a
0xa0010003 Block data file: data_1, block number 3, 1 block of size

Index file format (index)

Overview:

  • File header
  • least recently used (LRU) data (or eviction control data)
  • index table

File header

The index file header (struct IndexHeader) is 256 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x03\xc1" Signature
4 2 Minor version
6 2 Major version
8 4 Number of entries
12 4 Stored data size
16 4 Last created file number
The value represents the value of # in f_######
20 4 Dirty flag
Identifier for all entries being changed
24 4 Usage statistics data cache address
28 4 Table size
Where 0 represents 0x10000 (is this the same as the file size?)
32 4 Signals a previous crash
36 4 Identifier of an ongoing test or experiment
40 8 Creation time
48 52 x 8 = 208 Padding
Contains 0-byte values

Format versions

Value Description
2.0
2.1

Least recently used (LRU) data

The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:

offset size value description
0 2 x 4 = 8 Padding
8 4 Filled flag
Value to indicate if when the cache was filled
12 5 x 4 = 20 Array of sizes
32 5 x 4 = 20 Array of head cache addresses
52 5 x 4 = 20 Array of tail cache addresses
72 4 Transaction cache address
Value to indicate an in-flight operation
76 4 Operation
The in-flight operation
80 4 Operations list
The in-flight operations list
84 7 x 4 = 28 Padding
Contains 0-byte values

Array indexes

The array indexes correspond to the file types.

Value Description
0 Separate file
1 (Rankings) block data file
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file

Index table

The index table is an array of cache addresses.

Data block file format (data_#)

Overview:

  • File header
  • array of blocks

File header

The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x04\xc1" Signature
4 2 Minor version
6 2 Major version
8 2 File number (or file index)
The value represents the value of # in data_#
10 2 Next file number (or next file index)
The value represents the value of # in data_#
12 4 Block size (or cache entry) size
16 4 Number of entries
20 4 Maximum number of entries
24 4 x 4 = 16 Array of empty entry counters
The counters of empty entries for each type
40 4 x 4 = 16 Array of last used position (hints)
The last used position for each type
56 4 Updating
Value to keep track of updates to the header
60 5 x 4 = 20 User
80 2028 x 4 = 8112 Block allocation bitmap

Format versions

Value Description
2.0
2.1

Cache entry

The cache entry (struct EntryStore) is 256 bytes in size and consists of:

offset size value description
0 4 Hash
The hash of the key
4 4 Next entry cache address
The next entry with the same hash or bucket
8 4 Rankings node cache address
12 4 Reuse count
Value that indicates how often this entry was (re-)used
16 4 Refetch count
Value that indicates how often this entry was (re-)fetched (or downloaded)
20 4 Cache entry state
24 8 Creation time
32 4 Key data size
36 4 Long key data cache address
The value is 0 if no long key data is present
40 4 x 4 = 16 Array of data stream sizes
56 4 x 4 = 16 Array of data stream cache addresses
72 4 Cache entry flags
76 4 x 4 = 16 Padding
92 4 Self hash
The hash of the first 92 bytes of the cache entry is this used as a checksum?
96 160 Key data
Contains an UTF-8 encoded string with an end-of-string character with the original URL

Array indexes

The data stream array indexes correspond to:

Value Description
0 HTTP response headers
1 Page contents (Payload)
2
3

Cache entry state

Value Identifier Description
0 ENTRY_NORMAL Normal cache entry
1 ENTRY_EVICTED The cache entry was recently evicted
2 ENTRY_DOOMED The cache entry was doomed

Cache entry flags

Value Identifier Description
0x00000001 PARENT_ENTRY Parent cache entry
0x00000002 CHILD_ENTRY Child cache entry

Rankings node

The rankings node (struct RankingsNode) is 36 bytes in size and consists of:

offset size value description
0 8 Last used
Contains LRU information?
8 8 Last modified
Contains LRU information?
16 4 Next rankings node cache address
20 4 Previous rankings node cache address
24 4 Cache entry cache address
28 4 Is dirty flag
32 4 Self hash
The hash of the first 32 bytes of the rankings node is this used as a checksum?

Data stream

The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a gzip file.

Note that the last modification time of the gzip file is set to 0.

Hash

The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:

uint32_t SuperFastHash(
          const uint8_t *key,
          size_t key_size )
{ 
    size_t key_index    = 0;
    size_t remainder    = 0;
    uint32_t hash_value = 0;
    uint32_t temp_value = 0;

    if( ( key == NULL ) || ( key_siz 0 ) )
    {
       return( 0 );
    }
    remainder = key_size % 4;
    key_size -= remainder;

    for( key_index = 0;
         key_index < key_size;
         key_index += 4 )
    {
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );

        temp_value = ( temp_value << 11 ) ^ hash_value;

        hash_value  = ( hash_value << 16 ) ^ temp_value;
        hash_value += hash_value >> 11;
    }

    switch( remainder )
    {
        case 3:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value<< 16;
            hash_value ^= key[ key_index + 2 ] << 18;
            hash_value += hash_value >> 11;
            break;

        case 2:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value << 11;
            hash_value += hash_value >> 17;
            break;

        case 1:
            hash_value += key[ key_index ];
            hash_value ^= hash_value << 10;
            hash_value += hash_value >> 1;
            break;
    }

    /* Force "avalanching" of final 127 bits */
    hash_value ^= hash_value << 3;
    hash_value += hash_value >> 5;
    hash_value ^= hash_value << 4;
    hash_value += hash_value >> 17;
    hash_value ^= hash_value << 25;
    hash_value += hash_value >> 6;

    return hash_value;
}

See Also

External Links