Difference between pages "Prefetch" and "Chrome Disk Cache Format"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
 
(Data stream)
 
Line 1: Line 1:
{{Expand}}
+
{{expand}}
Windows Prefetch files, introduced in [[Windows|Windows XP]], are designed to speed up the application startup process. Prefetch files contain the name of the executable, a Unicode list of DLLs used by that executable, a count of how many times the executable has been run, and a timestamp indicating the last time the program was run. Although Prefetch is present in Windows 2003, by default it is only enabled for boot prefetching. The feature is also found in [[Windows|Windows Vista]], where it has been augmented with [[SuperFetch]], [[ReadyBoot]], and [[ReadyBoost]]. For SSD drives Prefetch is disabled by default [http://blogs.msdn.com/b/e7/archive/2009/05/05/support-and-q-a-for-solid-state-drives-and.aspx].
+
  
Up to 128 Prefetch files are stored in the <tt>%SystemRoot%\Prefetch</tt> directory [http://blogs.msdn.com/ryanmy/archive/2005/05/25/421882.aspx]. Each file in that directory should contain the name of the application, a dash, and then an eight character hash of the location from which that application was run, and a <tt>.pf</tt> extension. The filenames should be all uppercase except for the extension. The format of hashes is not known. A sample filename for [[md5deep]] would look like: <tt>MD5DEEP.EXE-4F89AB0C.pf</tt>. If an application is run from two different locations on the drive (i.e. the user runs <tt>C:\md5deep.exe</tt> and then <tt>C:\Apps\Hashing\md5deep.exe</tt>), there will be two different prefetch files in the Prefetch folder.
+
== Cache files ==
 +
The cache is stored in multiple:
 +
{| class="wikitable"
 +
|-
 +
! Filename
 +
! Description
 +
|-
 +
| index
 +
| The index file
 +
|-
 +
| data_#
 +
| Data block files
 +
|-
 +
| f_######
 +
| (Separate) data stream file
 +
|}
  
 +
== Cache address ==
 +
The cache address is 4 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| <i>If file type is 0 (Separate file)</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 28 bits
 +
|
 +
| File number <br> The value represents the value of # in f_######
 +
|-
 +
| <i>Else</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 16 bits
 +
|
 +
| Block number
 +
|-
 +
| 2.0
 +
| 8 bits
 +
|
 +
| File number (or file selector) <br> The value represents the value of # in data_#
 +
|-
 +
| 3.0
 +
| 2 bits
 +
|
 +
| Block size <br> The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
 +
|-
 +
| 3.2
 +
| 2 bits
 +
|
 +
| Reserved
 +
|-
 +
| <i>Common</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 3.4
 +
| 3 bits
 +
|
 +
| File type
 +
|-
 +
| 3.7
 +
| 1 bit
 +
|
 +
| Initialized flag
 +
|}
  
== File format ==
+
=== File types ===
Each Prefetch file has a 4-byte signature (at offset 4) "SCCA" (or in hexadecimal notation 0x53 0x43 0x43 0x41). The signature is assumed to be preceded by a 4-byte format version indicator:
+
{| class="wikitable"
* 0x00000011 for Windows XP and Windows 2003
+
|-
* 0x00000017 for Windows Vista and Windows 7
+
! Value
* 0x0000001a for Windows 8.1
+
! Description
 +
|-
 +
| 0
 +
| (Separate) data stream file
 +
|-
 +
| 1
 +
| (Rankings) block data file (36 byte block data file)
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|-
 +
|
 +
|
 +
|-
 +
| 6
 +
| Unknown; seen on Mac OS  X 0x6f430074
 +
|}
  
For more information about the file format see: [[Windows Prefetch File Format]]
+
==== Examples ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0x00000000
 +
| Not initialized
 +
|-
 +
| 0x8000002a
 +
| Data stream file: f_00002a
 +
|-
 +
| 0xa0010003
 +
| Block data file: data_1, block number 3, 1 block of size
 +
|}
  
== Timestamps ==
+
== Index file format (index) ==
 +
Overview:
 +
* File header
 +
* least recently used (LRU) data (or eviction control data)
 +
* index table
  
Both the [[NTFS]] timestamps for a Prefetch file and the timestamp embedded in each Prefetch file contain valuable information. The timestamp embedded within the Prefetch file is a 64-bit (QWORD) [http://msdn2.microsoft.com/en-us/library/ms724284.aspx FILETIME] object The creation date of the file indicates the first time the application was executed. Both the modification date of the file and the embedded timestamp indicate the last time the application was executed.
+
=== File header ===
 +
The index file header (struct IndexHeader) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x03\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Stored data size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Last created file number <br> The value represents the value of # in f_######
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Dirty flag <br> Identifier for all entries being changed
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Usage statistics data cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Table size <br> Where 0 represents 0x10000 (is this the same as the file size?)
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Signals a previous crash
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Identifier of an ongoing test or experiment
 +
|-
 +
| 40
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 48
 +
| 52 x 8 = 208
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
Windows will store timestamps according to Windows [http://msdn.microsoft.com/en-us/library/ms724290%28VS.85%29.aspx epoch].
+
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
|2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
  
==== Creation Time ====
+
=== Least recently used (LRU) data ===
The creation time does not have a static offset on any Windows platform. The location of the creation time can be found using the offset 0x8 + length of Volume path offset. See section Volume for more information.
+
The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 2 x 4 = 8
 +
|
 +
| Padding
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Filled flag <br> Value to indicate if when the cache was filled
 +
|-
 +
| 12
 +
| 5 x 4 = 20
 +
|
 +
| Array of sizes
 +
|-
 +
| 32
 +
| 5 x 4 = 20
 +
|
 +
| Array of head cache addresses
 +
|-
 +
| 52
 +
| 5 x 4 = 20
 +
|
 +
| Array of tail cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Transaction cache address <br> Value to indicate an in-flight operation
 +
|-
 +
| 76
 +
| 4
 +
|
 +
| Operation <br> The in-flight operation
 +
|-
 +
| 80
 +
| 4
 +
|
 +
| Operations list <br> The in-flight operations list
 +
|-
 +
| 84
 +
| 7 x 4 = 28
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
==== Last Run Time ====
+
=== Array indexes ===
A timestamp of when the application was last ran is embedded into the Prefetch file.
+
The array indexes correspond to the file types.
The offset from the beginning of the file to the "Last Run Time" is located:
+
{| class="wikitable"
* at offset 0x78 on Windows XP and Windows 2003.
+
|-
* at offset 0x80 on Windows Vista and Windows 7.
+
! Value
 +
! Description
 +
|-
 +
| 0
 +
| Separate file
 +
|-
 +
| 1
 +
| (Rankings) block data file
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|}
  
== MetaData ==
+
=== Index table ===
==== Header ====
+
The index table is an array of cache addresses.
In each Prefetch file, the size of the header is stored and can be found at offset 0x54 on [[Windows|Windows XP]], [[Windows|Windows Vista]], and [[Windows|Windows 7]]. The header size for [[Windows|Windows XP]] is 0x98 (152) and 0xf0 (240) on Windows Vista and Windows 7.
+
  
The Prefetch file will embed the application's name into the header at offset 0x10.
+
== Data block file format (data_#) ==
 +
Overview:
 +
* File header
 +
* array of blocks
  
==== Run Count ====
+
=== File header ===
The run count, or number of times the application has been run, is a 4-byte (DWORD) value located at offset 0x90 from the beginning of the file on [[Windows|Windows XP]]. On [[Windows|Windows Vista]] and [[Windows|Windows 7]], the run time can be found at 0x98.
+
The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x04\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 2
 +
|
 +
| File number (or file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 10
 +
| 2
 +
|
 +
| Next file number (or next file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Block size (or cache entry) size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Maximum number of entries
 +
|-
 +
| 24
 +
| 4 x 4 = 16
 +
|
 +
| Array of empty entry counters <br> The counters of empty entries for each type
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of last used position (hints) <br> The last used position for each type
 +
|-
 +
| 56
 +
| 4
 +
|
 +
| Updating <br> Value to keep track of updates to the header
 +
|-
 +
| 60
 +
| 5 x 4 = 20
 +
|
 +
| User
 +
|-
 +
| 80
 +
| 2028 x 4 = 8112
 +
|
 +
| Block allocation bitmap
 +
|}
  
==== Volume ====
+
==== Format versions ====
Volume related information, volume path and volume serial number, are embedded into the Prefetch file. The precise offset for this information is the same for each Prefetch file and Windows operating system. In the header at offset 0x6c, the location of the volume path is stored. The location is a 4-bytes (DWORD) value.
+
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
  
At the location given from offset 0x6c, a 4-byte value is stored which is the number of bytes from current offset (location from offset 0x6c) to the beginning of the volume path string. The location from the offset 0x6c, for ease of reading, will be called the "volume path offset." The volume path is embedded as an [http://en.wikipedia.org/wiki/UTF-16/UCS-2 UTF-16] encoded string.
+
=== Cache entry ===
 +
The cache entry (struct EntryStore) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
|
 +
| Hash <br> The hash of the key
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Next entry cache address <br> The next entry with the same hash or bucket
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Rankings node cache address
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Reuse count <br> Value that indicates how often this entry was (re-)used
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Refetch count <br> Value that indicates how often this entry was (re-)fetched (or downloaded)
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Cache entry state
 +
|-
 +
| 24
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Key data size
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Long key data cache address <br> The value is 0 if no long key data is present
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream sizes
 +
|-
 +
| 56
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Cache entry flags
 +
|-
 +
| 76
 +
| 4 x 4 = 16
 +
|
 +
| Padding
 +
|-
 +
| 92
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 92 bytes of the cache entry is this used as a checksum?
 +
|-
 +
| 96
 +
| 160
 +
|
 +
| Key data <br> Contains an UTF-8 encoded string with an end-of-string character with the original URL
 +
|}
  
The length of the volume path string is a 4-byte value is located at volume path offset + 0x4.
+
==== Array indexes ====
 +
The data stream array indexes correspond to:
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| HTTP response headers
 +
|-
 +
| 1
 +
| Page contents (Payload)
 +
|-
 +
| 2
 +
|
 +
|-
 +
| 3
 +
|
 +
|}
  
The volume [http://en.wikipedia.org/wiki/Volume_serial_number serial number] is a 4-byte value that identifies a media storage. A serial number does not have a consistent offset within a Prefetch between Windows operating systems. The 4-byte value can be found eight (8) bytes from the creation time location. The [http://en.wikipedia.org/wiki/Vol_%28command%29 vol] command on Windows can verify the volume serial number.
+
==== Cache entry state ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0
 +
| ENTRY_NORMAL
 +
| Normal cache entry
 +
|-
 +
| 1
 +
| ENTRY_EVICTED
 +
| The cache entry was recently evicted
 +
|-
 +
| 2
 +
| ENTRY_DOOMED
 +
| The cache entry was doomed
 +
|}
  
==== End of File ====
+
==== Cache entry flags ====
The end of file (EOF) for each Prefetch file is located at offset 0xc. The location of EOF also denotes the size of the Prefetch file.
+
{| class="wikitable"
 +
|-
 +
! Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0x00000001
 +
| PARENT_ENTRY
 +
| Parent cache entry
 +
|-
 +
| 0x00000002
 +
| CHILD_ENTRY
 +
| Child cache entry
 +
|}
  
==== Files ====
+
=== Rankings node ===
 +
The rankings node (struct RankingsNode) is 36 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 8
 +
|
 +
| Last used <br> Contains LRU information?
 +
|-
 +
| 8
 +
| 8
 +
|
 +
| Last modified <br> Contains LRU information?
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Next rankings node cache address
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Previous rankings node cache address
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Cache entry cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Is dirty flag
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 32 bytes of the rankings node is this used as a checksum?
 +
|}
  
Embedded within each Prefetch file are files and directories that were used doing the application's startup. The Prefetch file separates both filenames and directories into two different location in the file. Each string is encoded as a [http://en.wikipedia.org/wiki/UTF-16/UCS-2 UTF-16] string. Windows operating system uses UTF-16 encoding.
+
== Data stream ==
 +
The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a [[gzip|gzip file]].
  
The offset to the first set of filenames are at 0x64. The size of the first set of filenames can be found at offset 0x68. Both offsets are consistent between Windows XP, Windows Vista and Windows 7.
+
Note that the last modification time of the gzip file is set to 0.
  
In the bottom section of the Prefetch file are UTF-16 strings of directories. At the time of this writing (7/2011), the precise offset and size of the directory listing is unknown. The distance between the end of the Volume Path string and the beginning of the directory strings is given. An approach to finding the offset to the beginning of the directories listing is to obtain the distance value and the offset when the Volume Path string ends (after the NULL bytes). The distance value is at volume path offset + 0x18 (24). The distance is a 4-byte (DWORD) value. The end of second set of strings will complete the Prefetch file. The size of the directory listing is calculated by subtracting the start position of the directory listing from the end of file position.
+
== Hash ==
 
+
The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:
== Registry Keys ==
+
 
<pre>
 
<pre>
Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PrefetchParameters
+
uint32_t SuperFastHash(
</pre>
+
          const uint8_t *key,
 +
          size_t key_size )
 +
{
 +
    size_t key_index    = 0;
 +
    size_t remainder    = 0;
 +
    uint32_t hash_value = 0;
 +
    uint32_t temp_value = 0;
  
The EnablePrefetcher Registry value can be used to disable prefetch.
+
    if( ( key == NULL ) || ( key_siz 0 ) )
 +
    {
 +
      return( 0 );
 +
    }
 +
    remainder = key_size % 4;
 +
    key_size -= remainder;
  
== See Also ==
+
    for( key_index = 0;
* [[Windows Prefetch File Format]]
+
        key_index < key_size;
* [[SuperFetch]]
+
        key_index += 4 )
* [[Prefetch XML]]
+
    {
* [[Windows]]
+
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );
  
== External Links ==
+
        temp_value = ( temp_value << 11 ) ^ hash_value;
* [http://www.microsoft.com/whdc/driver/kernel/XP_kernel.mspx#ECLAC Microsoft's description of Prefetch when Windows XP was introduced]
+
* [http://msdn.microsoft.com/msdnmag/issues/01/12/XPKernel/default.aspx More detail from Microsoft]
+
* [http://en.wikipedia.org/wiki/Prefetcher Wikipedia Prefetcher]
+
* [http://msdn.microsoft.com/en-us/library/ms940847(v=winembedded.5).aspx MSDN: Disabling Prefetch]
+
* [http://blogs.msdn.com/b/e7/archive/2009/05/05/support-and-q-a-for-solid-state-drives-and.aspx Support and Q&A for Solid-State Drives], by Steven Sinofsky, May 5, 2009
+
* [http://computer-forensics.sans.org/blog/2009/08/05/de-mystifying-defrag-identifying-when-defrag-has-been-used-for-anti-forensics-part-1-windows-xp/ De-mystifying Defrag: Identifying When Defrag Has Been Used for Anti-Forensics (Part 1 - Windows XP)], by [[Chad Tilbury]], August 5, 2009
+
* [http://www.dfinews.com/articles/2010/12/decoding-prefetch-files-forensic-purposes-part-1 Decoding Prefetch Files for Forensic Purposes: Part 1], by [[Mark Wade]], December 8, 2010
+
* [http://crucialsecurityblog.harris.com/2011/04/11/prefetch-files-at-face-value/ Prefetch Files at Face Value], by [[Mark Wade]], April 11, 2011
+
* [http://windowsir.blogspot.ch/2012/03/prefetch-analysis-revisited.html Prefetch Analysis, Revisited], by [[Harlan Carvey]], March 8, 2012
+
* [http://journeyintoir.blogspot.ch/2012/12/ntosboot-prefetch-file.html NTOSBOOT Prefetch File], by [[Corey Harrell]], December 5, 2012
+
  
== Tools ==
+
        hash_value  = ( hash_value << 16 ) ^ temp_value;
 +
        hash_value += hash_value >> 11;
 +
    }
  
=== Commercial ===
+
    switch( remainder )
 +
    {
 +
        case 3:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value<< 16;
 +
            hash_value ^= key[ key_index + 2 ] << 18;
 +
            hash_value += hash_value >> 11;
 +
            break;
  
=== Free - Non Open Source ===
+
        case 2:
* [http://www.woanware.co.uk/forensics/prefetchforensics.html PrefetchForensics], PrefetchForensics is an application to extract information from Windows Prefetch files
+
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
* [http://redwolfcomputerforensics.com/index.php?option=com_content&task=view&id=42&Itemid=55 Prefetch-Parser], Parse the prefetch files and display information
+
            hash_value ^= hash_value << 11;
* [http://www.mitec.cz/wfa.html Windows File Analyzer] - Parses Prefetch files, thumbnail databases, shortcuts, index.dat files, and the recycle bin
+
            hash_value += hash_value >> 17;
* [http://www.tzworks.net/prototype_page.php?proto_id=1 Windows Prefetch Parser (pf)], Free tool that can be run on Windows, Linux or Mac OS-X
+
            break;
 +
 
 +
        case 1:
 +
            hash_value += key[ key_index ];
 +
            hash_value ^= hash_value << 10;
 +
            hash_value += hash_value >> 1;
 +
            break;
 +
    }
 +
 
 +
    /* Force "avalanching" of final 127 bits */
 +
    hash_value ^= hash_value << 3;
 +
    hash_value += hash_value >> 5;
 +
    hash_value ^= hash_value << 4;
 +
    hash_value += hash_value >> 17;
 +
    hash_value ^= hash_value << 25;
 +
    hash_value += hash_value >> 6;
 +
 
 +
    return hash_value;
 +
}
 +
</pre>
 +
 
 +
== See Also ==
 +
* [[Google Chrome]]
 +
* [[gzip]]
 +
 
 +
== External Links ==
 +
* [http://www.chromium.org/developers/design-documents/network-stack/disk-cache Disk Cache], The Chromium Projects
  
=== Open Source ===
+
[[Category:File Formats]]
* [https://code.google.com/p/prefetch-tool/ prefetch-tool], Script to extract information from windows prefetch folder
+

Revision as of 04:17, 22 June 2014

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Cache files

The cache is stored in multiple:

Filename Description
index The index file
data_# Data block files
f_###### (Separate) data stream file

Cache address

The cache address is 4 bytes in size and consists of:

offset size value description
If file type is 0 (Separate file)
0.0 28 bits File number
The value represents the value of # in f_######
Else
0.0 16 bits Block number
2.0 8 bits File number (or file selector)
The value represents the value of # in data_#
3.0 2 bits Block size
The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
3.2 2 bits Reserved
Common
3.4 3 bits File type
3.7 1 bit Initialized flag

File types

Value Description
0 (Separate) data stream file
1 (Rankings) block data file (36 byte block data file)
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file
6 Unknown; seen on Mac OS X 0x6f430074

Examples

Value Description
0x00000000 Not initialized
0x8000002a Data stream file: f_00002a
0xa0010003 Block data file: data_1, block number 3, 1 block of size

Index file format (index)

Overview:

  • File header
  • least recently used (LRU) data (or eviction control data)
  • index table

File header

The index file header (struct IndexHeader) is 256 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x03\xc1" Signature
4 2 Minor version
6 2 Major version
8 4 Number of entries
12 4 Stored data size
16 4 Last created file number
The value represents the value of # in f_######
20 4 Dirty flag
Identifier for all entries being changed
24 4 Usage statistics data cache address
28 4 Table size
Where 0 represents 0x10000 (is this the same as the file size?)
32 4 Signals a previous crash
36 4 Identifier of an ongoing test or experiment
40 8 Creation time
48 52 x 8 = 208 Padding
Contains 0-byte values

Format versions

Value Description
2.0
2.1

Least recently used (LRU) data

The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:

offset size value description
0 2 x 4 = 8 Padding
8 4 Filled flag
Value to indicate if when the cache was filled
12 5 x 4 = 20 Array of sizes
32 5 x 4 = 20 Array of head cache addresses
52 5 x 4 = 20 Array of tail cache addresses
72 4 Transaction cache address
Value to indicate an in-flight operation
76 4 Operation
The in-flight operation
80 4 Operations list
The in-flight operations list
84 7 x 4 = 28 Padding
Contains 0-byte values

Array indexes

The array indexes correspond to the file types.

Value Description
0 Separate file
1 (Rankings) block data file
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file

Index table

The index table is an array of cache addresses.

Data block file format (data_#)

Overview:

  • File header
  • array of blocks

File header

The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x04\xc1" Signature
4 2 Minor version
6 2 Major version
8 2 File number (or file index)
The value represents the value of # in data_#
10 2 Next file number (or next file index)
The value represents the value of # in data_#
12 4 Block size (or cache entry) size
16 4 Number of entries
20 4 Maximum number of entries
24 4 x 4 = 16 Array of empty entry counters
The counters of empty entries for each type
40 4 x 4 = 16 Array of last used position (hints)
The last used position for each type
56 4 Updating
Value to keep track of updates to the header
60 5 x 4 = 20 User
80 2028 x 4 = 8112 Block allocation bitmap

Format versions

Value Description
2.0
2.1

Cache entry

The cache entry (struct EntryStore) is 256 bytes in size and consists of:

offset size value description
0 4 Hash
The hash of the key
4 4 Next entry cache address
The next entry with the same hash or bucket
8 4 Rankings node cache address
12 4 Reuse count
Value that indicates how often this entry was (re-)used
16 4 Refetch count
Value that indicates how often this entry was (re-)fetched (or downloaded)
20 4 Cache entry state
24 8 Creation time
32 4 Key data size
36 4 Long key data cache address
The value is 0 if no long key data is present
40 4 x 4 = 16 Array of data stream sizes
56 4 x 4 = 16 Array of data stream cache addresses
72 4 Cache entry flags
76 4 x 4 = 16 Padding
92 4 Self hash
The hash of the first 92 bytes of the cache entry is this used as a checksum?
96 160 Key data
Contains an UTF-8 encoded string with an end-of-string character with the original URL

Array indexes

The data stream array indexes correspond to:

Value Description
0 HTTP response headers
1 Page contents (Payload)
2
3

Cache entry state

Value Identifier Description
0 ENTRY_NORMAL Normal cache entry
1 ENTRY_EVICTED The cache entry was recently evicted
2 ENTRY_DOOMED The cache entry was doomed

Cache entry flags

Value Identifier Description
0x00000001 PARENT_ENTRY Parent cache entry
0x00000002 CHILD_ENTRY Child cache entry

Rankings node

The rankings node (struct RankingsNode) is 36 bytes in size and consists of:

offset size value description
0 8 Last used
Contains LRU information?
8 8 Last modified
Contains LRU information?
16 4 Next rankings node cache address
20 4 Previous rankings node cache address
24 4 Cache entry cache address
28 4 Is dirty flag
32 4 Self hash
The hash of the first 32 bytes of the rankings node is this used as a checksum?

Data stream

The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a gzip file.

Note that the last modification time of the gzip file is set to 0.

Hash

The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:

uint32_t SuperFastHash(
          const uint8_t *key,
          size_t key_size )
{ 
    size_t key_index    = 0;
    size_t remainder    = 0;
    uint32_t hash_value = 0;
    uint32_t temp_value = 0;

    if( ( key == NULL ) || ( key_siz 0 ) )
    {
       return( 0 );
    }
    remainder = key_size % 4;
    key_size -= remainder;

    for( key_index = 0;
         key_index < key_size;
         key_index += 4 )
    {
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );

        temp_value = ( temp_value << 11 ) ^ hash_value;

        hash_value  = ( hash_value << 16 ) ^ temp_value;
        hash_value += hash_value >> 11;
    }

    switch( remainder )
    {
        case 3:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value<< 16;
            hash_value ^= key[ key_index + 2 ] << 18;
            hash_value += hash_value >> 11;
            break;

        case 2:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value << 11;
            hash_value += hash_value >> 17;
            break;

        case 1:
            hash_value += key[ key_index ];
            hash_value ^= hash_value << 10;
            hash_value += hash_value >> 1;
            break;
    }

    /* Force "avalanching" of final 127 bits */
    hash_value ^= hash_value << 3;
    hash_value += hash_value >> 5;
    hash_value ^= hash_value << 4;
    hash_value += hash_value >> 17;
    hash_value ^= hash_value << 25;
    hash_value += hash_value >> 6;

    return hash_value;
}

See Also

External Links