Difference between pages "Plaso" and "Chrome Disk Cache Format"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(File formats)
 
(External Links)
 
Line 1: Line 1:
{{Infobox_Software |
+
{{expand}}
  name = plaso |
+
  maintainer = [[Kristinn Gudjonsson]], [[Joachim Metz]] |
+
  os = [[Linux]], [[Mac OS X]], [[Windows]] |
+
  genre = {{Analysis}} |
+
  license = {{APL}} |
+
  website = [https://code.google.com/p/plaso/ code.google.com/p/plaso/] |
+
}}
+
  
Plaso (plaso langar að safna öllu) is the Python based back-end engine used by tools such as log2timeline for automatic creation of a super timelines. The goal of log2timeline (and thus plaso) is to provide a single tool that can parse various log files and forensic artifacts from computers and related systems, such as network equipment to produce a single correlated timeline. This timeline can then be easily analysed by forensic investigators/analysts, speeding up investigations by correlating the vast amount of information found on an average computer system. Plaso is intended to be applied for creating super timelines but also supports creating [http://blog.kiddaland.net/2013/02/targeted-timelines-part-i.html targeted timelines].
+
== Cache files ==
 +
The cache is stored in multiple:
 +
{| class="wikitable"
 +
|-
 +
! Filename
 +
! Description
 +
|-
 +
| index
 +
| The index file
 +
|-
 +
| data_#
 +
| Data block files
 +
|-
 +
| f_######
 +
| (Separate) data stream file
 +
|}
  
The Plaso project site also provides [[4n6time]], formerly "l2t_Review", which is a cross-platform forensic tool for timeline creation and review by [[David Nides]].
+
== Cache address ==
 +
The cache address is 4 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| <i>If file type is 0 (Separate file)</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 28 bits
 +
|
 +
| File number <br> The value represents the value of # in f_######
 +
|-
 +
| <i>Else</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 16 bits
 +
|
 +
| Block number
 +
|-
 +
| 2.0
 +
| 8 bits
 +
|
 +
| File number (or file selector) <br> The value represents the value of # in data_#
 +
|-
 +
| 3.0
 +
| 2 bits
 +
|
 +
| Block size <br> The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
 +
|-
 +
| 3.2
 +
| 2 bits
 +
|
 +
| Reserved
 +
|-
 +
| <i>Common</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 3.4
 +
| 3 bits
 +
|
 +
| File type
 +
|-
 +
| 3.7
 +
| 1 bit
 +
|
 +
| Initialized flag
 +
|}
  
== Supported Formats ==
+
=== File types ===
The information below is based of version 1.1.0
+
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| (Separate) data stream file
 +
|-
 +
| 1
 +
| (Rankings) block data file (36 byte block data file)
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|-
 +
|
 +
|
 +
|-
 +
| 6
 +
| Unknown; seen on Mac OS  X 0x6f430074
 +
|}
  
=== Storage Media Image File Formats ===
+
==== Examples ====
Storage Medis Image File Format support is provided by [[dfvfs]].
+
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0x00000000
 +
| Not initialized
 +
|-
 +
| 0x8000002a
 +
| Data stream file: f_00002a
 +
|-
 +
| 0xa0010003
 +
| Block data file: data_1, block number 3, 1 block of size
 +
|}
  
=== Volume System Formats ===
+
== Index file format (index) ==
Volume System Format support is provided by [[dfvfs]].
+
Overview:
 +
* File header
 +
* least recently used (LRU) data (or eviction control data)
 +
* index table
  
=== File System Formats ===
+
=== File header ===
File System Format support is provided by [[dfvfs]].
+
The index file header (struct IndexHeader) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x03\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Stored data size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Last created file number <br> The value represents the value of # in f_######
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Dirty flag <br> Identifier for all entries being changed
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Usage statistics data cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Table size <br> Where 0 represents 0x10000 (is this the same as the file size?)
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Signals a previous crash
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Identifier of an ongoing test or experiment
 +
|-
 +
| 40
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 48
 +
| 52 x 8 = 208
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
=== File formats ===
+
==== Format versions ====
* Apple System Log (ASL)
+
{| class="wikitable"
* Basic Security Module (BSM)
+
|-
* Bencode files
+
! Value
* [[Chrome Disk Cache Format|Chrome cache files]]
+
! Description
* CUPS IPP
+
|-
* [[Extensible Storage Engine (ESE) Database File (EDB) format]] using [[libesedb]]
+
|2.0
* Firefox Cache
+
|  
* [[Java|Java IDX]]
+
|-
* MacOS-X Application firewall
+
| 2.1
* MacOS-X Keychain
+
|  
* MacOS-X Securityd
+
|}
* MacOS-X Wifi
+
* ([[SleuthKit]]) mactime logs
+
* McAfee Anti-Virus Logs
+
* Microsoft [[Internet Explorer History File Format]] (also known as MSIE 4 - 9 Cache Files or index.dat) using [[libmsiecf]]
+
* [[OLE Compound File]] using [[libolecf]]
+
* [[Opera|Opera Browser history]]
+
* OpenXML
+
* Pcap files
+
* Popularity Contest log
+
* [[Property list (plist)|Property list (plist) format]] using [[binplist]]
+
* SELinux audit logs
+
* SkyDrive log and error log files
+
* [[SQLite database format]] using [[SQLite]]
+
* Symantec AV Corporate Edition and Endpoint Protection log
+
* Syslog
+
* UTMP
+
* UTMPX
+
* [[Windows Event Log (EVT)]] using [[libevt]]
+
* Windows Firewall
+
* Windows Job files (also known as "at jobs")
+
* [[Windows Prefetch File Format|Windows Prefetch File format]]
+
* [[Windows#Recycle_Bin|Windows Recycle bin]] (INFO2 and $I/$R)
+
* [[Windows NT Registry File (REGF)]] using [[libregf]]
+
* [[LNK|Windows Shortcut File (LNK) format]] using [[liblnk]]
+
* [[Windows XML Event Log (EVTX)]] using [[libevtx]]
+
* Xchat and Xchat scrollback files
+
  
=== Bencode file formats ===
+
=== Least recently used (LRU) data ===
* Transmission
+
The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:
* uTorrent
+
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 2 x 4 = 8
 +
|
 +
| Padding
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Filled flag <br> Value to indicate if when the cache was filled
 +
|-
 +
| 12
 +
| 5 x 4 = 20
 +
|
 +
| Array of sizes
 +
|-
 +
| 32
 +
| 5 x 4 = 20
 +
|
 +
| Array of head cache addresses
 +
|-
 +
| 52
 +
| 5 x 4 = 20
 +
|
 +
| Array of tail cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Transaction cache address <br> Value to indicate an in-flight operation
 +
|-
 +
| 76
 +
| 4
 +
|
 +
| Operation <br> The in-flight operation
 +
|-
 +
| 80
 +
| 4
 +
|
 +
| Operations list <br> The in-flight operations list
 +
|-
 +
| 84
 +
| 7 x 4 = 28
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
=== ESE database file formats ===
+
=== Array indexes ===
* Internet Explorer WebCache format
+
The array indexes correspond to the file types.
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| Separate file
 +
|-
 +
| 1
 +
| (Rankings) block data file
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|}
  
=== OLE Compound File formats ===
+
=== Index table ===
* Document summary information
+
The index table is an array of cache addresses.
* Summary information (top-level only)
+
  
=== Property list (plist) formats ===
+
== Data block file format (data_#) ==
* Airport
+
Overview:
* Apple Account
+
* File header
* Bluetooth
+
* array of blocks
* Install History
+
* iPod/iPhone
+
* Mac User
+
* [[Apple Safari|Safari history]]
+
* Software Update
+
* Spotlight
+
* Spotlight Volume Information
+
* Timemachine
+
  
=== SQLite database file formats ===
+
=== File header ===
* Android call logs
+
The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:
* Android SMS
+
{| class="wikitable"
* Chrome cookies
+
|-
* [[Google Chrome|Chrome browsing and downloads history]]
+
! offset
* [[Mozilla Firefox|Firefox browsing and downloads history]]
+
! size
* Google Drive
+
! value
* Launch services quarantine events
+
! description
* MacKeeper cache
+
|-
* Mac OS X document versions
+
| 0
* Skype text conversations
+
| 4
* [[Zeitgeist|Zeitgeist activity database]]
+
| "\xc3\xca\x04\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 2
 +
|
 +
| File number (or file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 10
 +
| 2
 +
|
 +
| Next file number (or next file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Block size (or cache entry) size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Maximum number of entries
 +
|-
 +
| 24
 +
| 4 x 4 = 16
 +
|
 +
| Array of empty entry counters <br> The counters of empty entries for each type
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of last used position (hints) <br> The last used position for each type
 +
|-
 +
| 56
 +
| 4
 +
|
 +
| Updating <br> Value to keep track of updates to the header
 +
|-
 +
| 60
 +
| 5 x 4 = 20
 +
|
 +
| User
 +
|-
 +
| 80
 +
| 2028 x 4 = 8112
 +
|
 +
| Block allocation bitmap
 +
|}
  
=== [[Windows Registry]] formats ===
+
==== Format versions ====
* [[Windows Application Compatibility|AppCompatCache]]
+
{| class="wikitable"
* CCleaner
+
|-
* Less Frequently Used (LFU)
+
! Value
* MountPoints2
+
! Description
* Most Recently Used (MRU) MRUList and MRUListEx (no shell item support)
+
|-
* [[Internet Explorer|MSIE Zones]]
+
| 2.0
* Office MRU
+
|
* Outlook Search
+
|-
* [[Windows Registry#Run/RunOnce|Run and RunOnce keys]]
+
| 2.1
* Services
+
|
* Terminal Server MRU
+
|}
* Typed URLS
+
* USBStor
+
* UserAssist
+
* WinRar
+
* Windows version information
+
  
== History ==
+
=== Cache entry ===
Plaso is a Python-based rewrite of the Perl-based [[log2timeline]] initially created by [[Kristinn Gudjonsson]]. Plaso builds upon the [[SleuthKit]], [[libyal]], [[dfvfs]] and various other projects.
+
The cache entry (struct EntryStore) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
|
 +
| Hash <br> The hash of the key
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Next entry cache address <br> The next entry with the same hash or bucket
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Rankings node cache address
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Reuse count <br> Value that indicates how often this entry was (re-)used
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Refetch count <br> Value that indicates how often this entry was (re-)fetched (or downloaded)
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Cache entry state
 +
|-
 +
| 24
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Key data size
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Long key data cache address <br> The value is 0 if no long key data is present
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream sizes
 +
|-
 +
| 56
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Cache entry flags
 +
|-
 +
| 76
 +
| 4 x 4 = 16
 +
|
 +
| Padding
 +
|-
 +
| 92
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 92 bytes of the cache entry is this used as a checksum?
 +
|-
 +
| 96
 +
| 160
 +
|
 +
| Key data <br> Contains an UTF-8 encoded string with an end-of-string character with the original URL
 +
|}
 +
 
 +
==== Array indexes ====
 +
The data stream array indexes correspond to:
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| HTTP response headers
 +
|-
 +
| 1
 +
| Page contents (Payload)
 +
|-
 +
| 2
 +
|
 +
|-
 +
| 3
 +
|
 +
|}
 +
 
 +
==== Cache entry state ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0
 +
| ENTRY_NORMAL
 +
| Normal cache entry
 +
|-
 +
| 1
 +
| ENTRY_EVICTED
 +
| The cache entry was recently evicted
 +
|-
 +
| 2
 +
| ENTRY_DOOMED
 +
| The cache entry was doomed
 +
|}
 +
 
 +
==== Cache entry flags ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0x00000001
 +
| PARENT_ENTRY
 +
| Parent cache entry
 +
|-
 +
| 0x00000002
 +
| CHILD_ENTRY
 +
| Child cache entry
 +
|}
 +
 
 +
=== Rankings node ===
 +
The rankings node (struct RankingsNode) is 36 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 8
 +
|
 +
| Last used <br> Contains LRU information?
 +
|-
 +
| 8
 +
| 8
 +
|
 +
| Last modified <br> Contains LRU information?
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Next rankings node cache address
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Previous rankings node cache address
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Cache entry cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Is dirty flag
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 32 bytes of the rankings node is this used as a checksum?
 +
|}
 +
 
 +
== Data stream ==
 +
The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a [[gzip|gzip file]].
 +
 
 +
Note that the last modification time of the gzip file is set to 0.
 +
 
 +
== Hash ==
 +
The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:
 +
<pre>
 +
uint32_t SuperFastHash(
 +
          const uint8_t *key,
 +
          size_t key_size )
 +
{
 +
    size_t key_index    = 0;
 +
    size_t remainder    = 0;
 +
    uint32_t hash_value = 0;
 +
    uint32_t temp_value = 0;
 +
 
 +
    if( ( key == NULL ) || ( key_siz 0 ) )
 +
    {
 +
      return( 0 );
 +
    }
 +
    remainder = key_size % 4;
 +
    key_size -= remainder;
 +
 
 +
    for( key_index = 0;
 +
        key_index < key_size;
 +
        key_index += 4 )
 +
    {
 +
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );
 +
 
 +
        temp_value = ( temp_value << 11 ) ^ hash_value;
 +
 
 +
        hash_value  = ( hash_value << 16 ) ^ temp_value;
 +
        hash_value += hash_value >> 11;
 +
    }
 +
 
 +
    switch( remainder )
 +
    {
 +
        case 3:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value<< 16;
 +
            hash_value ^= key[ key_index + 2 ] << 18;
 +
            hash_value += hash_value >> 11;
 +
            break;
 +
 
 +
        case 2:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value << 11;
 +
            hash_value += hash_value >> 17;
 +
            break;
 +
 
 +
        case 1:
 +
            hash_value += key[ key_index ];
 +
            hash_value ^= hash_value << 10;
 +
            hash_value += hash_value >> 1;
 +
            break;
 +
    }
 +
 
 +
    /* Force "avalanching" of final 127 bits */
 +
    hash_value ^= hash_value << 3;
 +
    hash_value += hash_value >> 5;
 +
    hash_value ^= hash_value << 4;
 +
    hash_value += hash_value >> 17;
 +
    hash_value ^= hash_value << 25;
 +
    hash_value += hash_value >> 6;
 +
 
 +
    return hash_value;
 +
}
 +
</pre>
  
 
== See Also ==
 
== See Also ==
* [[dfvfs]]
+
* [[Google Chrome]]
* [[log2timeline]]
+
* [[gzip]]
  
 
== External Links ==
 
== External Links ==
* [https://code.google.com/p/plaso/ Project site]
+
* [http://www.chromium.org/developers/design-documents/network-stack/disk-cache Disk Cache], The Chromium Projects
* [https://sites.google.com/a/kiddaland.net/plaso/home Project documentation]
+
* [https://e366e647f8637dd31e0a13f75e5469341a9ab0ee.googledrive.com/host/0B30H7z4S52FleW5vUHBnblJfcjg/Docs/Chrome%20Cache%20file%20format.pdf Chrome Cache file format], by the [[plaso|plaso project]], April 2014
* [http://blog.kiddaland.net/ Project blog]
+
 
* [https://sites.google.com/a/kiddaland.net/plaso/usage/4n6time 4n6time]
+
[[Category:File Formats]]

Latest revision as of 04:21, 22 June 2014

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Cache files

The cache is stored in multiple:

Filename Description
index The index file
data_# Data block files
f_###### (Separate) data stream file

Cache address

The cache address is 4 bytes in size and consists of:

offset size value description
If file type is 0 (Separate file)
0.0 28 bits File number
The value represents the value of # in f_######
Else
0.0 16 bits Block number
2.0 8 bits File number (or file selector)
The value represents the value of # in data_#
3.0 2 bits Block size
The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
3.2 2 bits Reserved
Common
3.4 3 bits File type
3.7 1 bit Initialized flag

File types

Value Description
0 (Separate) data stream file
1 (Rankings) block data file (36 byte block data file)
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file
6 Unknown; seen on Mac OS X 0x6f430074

Examples

Value Description
0x00000000 Not initialized
0x8000002a Data stream file: f_00002a
0xa0010003 Block data file: data_1, block number 3, 1 block of size

Index file format (index)

Overview:

  • File header
  • least recently used (LRU) data (or eviction control data)
  • index table

File header

The index file header (struct IndexHeader) is 256 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x03\xc1" Signature
4 2 Minor version
6 2 Major version
8 4 Number of entries
12 4 Stored data size
16 4 Last created file number
The value represents the value of # in f_######
20 4 Dirty flag
Identifier for all entries being changed
24 4 Usage statistics data cache address
28 4 Table size
Where 0 represents 0x10000 (is this the same as the file size?)
32 4 Signals a previous crash
36 4 Identifier of an ongoing test or experiment
40 8 Creation time
48 52 x 8 = 208 Padding
Contains 0-byte values

Format versions

Value Description
2.0
2.1

Least recently used (LRU) data

The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:

offset size value description
0 2 x 4 = 8 Padding
8 4 Filled flag
Value to indicate if when the cache was filled
12 5 x 4 = 20 Array of sizes
32 5 x 4 = 20 Array of head cache addresses
52 5 x 4 = 20 Array of tail cache addresses
72 4 Transaction cache address
Value to indicate an in-flight operation
76 4 Operation
The in-flight operation
80 4 Operations list
The in-flight operations list
84 7 x 4 = 28 Padding
Contains 0-byte values

Array indexes

The array indexes correspond to the file types.

Value Description
0 Separate file
1 (Rankings) block data file
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file

Index table

The index table is an array of cache addresses.

Data block file format (data_#)

Overview:

  • File header
  • array of blocks

File header

The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x04\xc1" Signature
4 2 Minor version
6 2 Major version
8 2 File number (or file index)
The value represents the value of # in data_#
10 2 Next file number (or next file index)
The value represents the value of # in data_#
12 4 Block size (or cache entry) size
16 4 Number of entries
20 4 Maximum number of entries
24 4 x 4 = 16 Array of empty entry counters
The counters of empty entries for each type
40 4 x 4 = 16 Array of last used position (hints)
The last used position for each type
56 4 Updating
Value to keep track of updates to the header
60 5 x 4 = 20 User
80 2028 x 4 = 8112 Block allocation bitmap

Format versions

Value Description
2.0
2.1

Cache entry

The cache entry (struct EntryStore) is 256 bytes in size and consists of:

offset size value description
0 4 Hash
The hash of the key
4 4 Next entry cache address
The next entry with the same hash or bucket
8 4 Rankings node cache address
12 4 Reuse count
Value that indicates how often this entry was (re-)used
16 4 Refetch count
Value that indicates how often this entry was (re-)fetched (or downloaded)
20 4 Cache entry state
24 8 Creation time
32 4 Key data size
36 4 Long key data cache address
The value is 0 if no long key data is present
40 4 x 4 = 16 Array of data stream sizes
56 4 x 4 = 16 Array of data stream cache addresses
72 4 Cache entry flags
76 4 x 4 = 16 Padding
92 4 Self hash
The hash of the first 92 bytes of the cache entry is this used as a checksum?
96 160 Key data
Contains an UTF-8 encoded string with an end-of-string character with the original URL

Array indexes

The data stream array indexes correspond to:

Value Description
0 HTTP response headers
1 Page contents (Payload)
2
3

Cache entry state

Value Identifier Description
0 ENTRY_NORMAL Normal cache entry
1 ENTRY_EVICTED The cache entry was recently evicted
2 ENTRY_DOOMED The cache entry was doomed

Cache entry flags

Value Identifier Description
0x00000001 PARENT_ENTRY Parent cache entry
0x00000002 CHILD_ENTRY Child cache entry

Rankings node

The rankings node (struct RankingsNode) is 36 bytes in size and consists of:

offset size value description
0 8 Last used
Contains LRU information?
8 8 Last modified
Contains LRU information?
16 4 Next rankings node cache address
20 4 Previous rankings node cache address
24 4 Cache entry cache address
28 4 Is dirty flag
32 4 Self hash
The hash of the first 32 bytes of the rankings node is this used as a checksum?

Data stream

The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a gzip file.

Note that the last modification time of the gzip file is set to 0.

Hash

The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:

uint32_t SuperFastHash(
          const uint8_t *key,
          size_t key_size )
{ 
    size_t key_index    = 0;
    size_t remainder    = 0;
    uint32_t hash_value = 0;
    uint32_t temp_value = 0;

    if( ( key == NULL ) || ( key_siz 0 ) )
    {
       return( 0 );
    }
    remainder = key_size % 4;
    key_size -= remainder;

    for( key_index = 0;
         key_index < key_size;
         key_index += 4 )
    {
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );

        temp_value = ( temp_value << 11 ) ^ hash_value;

        hash_value  = ( hash_value << 16 ) ^ temp_value;
        hash_value += hash_value >> 11;
    }

    switch( remainder )
    {
        case 3:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value<< 16;
            hash_value ^= key[ key_index + 2 ] << 18;
            hash_value += hash_value >> 11;
            break;

        case 2:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value << 11;
            hash_value += hash_value >> 17;
            break;

        case 1:
            hash_value += key[ key_index ];
            hash_value ^= hash_value << 10;
            hash_value += hash_value >> 1;
            break;
    }

    /* Force "avalanching" of final 127 bits */
    hash_value ^= hash_value << 3;
    hash_value += hash_value >> 5;
    hash_value ^= hash_value << 4;
    hash_value += hash_value >> 17;
    hash_value ^= hash_value << 25;
    hash_value += hash_value >> 6;

    return hash_value;
}

See Also

External Links