Difference between pages "NSF DUE-0919593" and "Chrome Disk Cache Format"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
 
(Data stream)
 
Line 1: Line 1:
This page includes links to digital forensics resources produced under NSF DUE-0919593, "Creating Realistic Forensic Corpora for Undergraduate Education and Research"
+
{{expand}}
  
'''EDUCATIONAL DATA SETS'''
+
== Cache files ==
 +
The cache is stored in multiple:
 +
{| class="wikitable"
 +
|-
 +
! Filename
 +
! Description
 +
|-
 +
| index
 +
| The index file
 +
|-
 +
| data_#
 +
| Data block files
 +
|-
 +
| f_######
 +
| (Separate) data stream file
 +
|}
  
'''1. 2009-M57 "Patents" scenario'''
+
== Cache address ==
 +
The cache address is 4 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| <i>If file type is 0 (Separate file)</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 28 bits
 +
|
 +
| File number <br> The value represents the value of # in f_######
 +
|-
 +
| <i>Else</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 16 bits
 +
|
 +
| Block number
 +
|-
 +
| 2.0
 +
| 8 bits
 +
|
 +
| File number (or file selector) <br> The value represents the value of # in data_#
 +
|-
 +
| 3.0
 +
| 2 bits
 +
|
 +
| Block size <br> The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
 +
|-
 +
| 3.2
 +
| 2 bits
 +
|
 +
| Reserved
 +
|-
 +
| <i>Common</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 3.4
 +
| 3 bits
 +
|
 +
| File type
 +
|-
 +
| 3.7
 +
| 1 bit
 +
|
 +
| Initialized flag
 +
|}
  
This scenario involves a small company called M57 which was engaged in prior art searches for patents. The fictional company is contacted by the local police in November 2009 after a person purchases a computer from Craigslist and discovers "kitty porn" on the computer. The police trace the computer back to the M57 company.
+
=== File types ===
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| (Separate) data stream file
 +
|-
 +
| 1
 +
| (Rankings) block data file (36 byte block data file)
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|-
 +
|
 +
|
 +
|-
 +
| 6
 +
| Unknown; seen on Mac OS  X 0x6f430074
 +
|}
  
The scenario actually involves three separate criminal activities:
+
==== Examples ====
      1 - Exfiltration of proprietary information by an M57 employee.
+
{| class="wikitable"
      2 - Stealing of M57's property and selling it on Craigslist.
+
|-
      3 - The possession of "kitty porn" photos by an M57 employee.
+
! Value
 +
! Description
 +
|-
 +
| 0x00000000
 +
| Not initialized
 +
|-
 +
| 0x8000002a
 +
| Data stream file: f_00002a
 +
|-
 +
| 0xa0010003
 +
| Block data file: data_1, block number 3, 1 block of size
 +
|}
  
This is an involved scenario which has the following information available to students trying to "solve" the case:
+
== Index file format (index) ==
      1 - Disk image of the computer that was sold on Craigs List
+
Overview:
      2 - Disk images of the firm's five computers when the police show up.
+
* File header
      3 - Disk images of the four USB drives that were found on-site belonging to M57 employees
+
* least recently used (LRU) data (or eviction control data)
      4 - The RAM image of each computer just before the disk was imaged.
+
* index table
  
There are approximately 2-4 weeks of use on each computer.
+
=== File header ===
 +
The index file header (struct IndexHeader) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x03\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Stored data size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Last created file number <br> The value represents the value of # in f_######
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Dirty flag <br> Identifier for all entries being changed
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Usage statistics data cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Table size <br> Where 0 represents 0x10000 (is this the same as the file size?)
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Signals a previous crash
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Identifier of an ongoing test or experiment
 +
|-
 +
| 40
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 48
 +
| 52 x 8 = 208
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
'''2. Nitroba University Harassment Scenario'''
+
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
|2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
  
This scenario involves a harassment case at the fictional Nitroba University.
+
=== Least recently used (LRU) data ===
 +
The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 2 x 4 = 8
 +
|
 +
| Padding
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Filled flag <br> Value to indicate if when the cache was filled
 +
|-
 +
| 12
 +
| 5 x 4 = 20
 +
|
 +
| Array of sizes
 +
|-
 +
| 32
 +
| 5 x 4 = 20
 +
|
 +
| Array of head cache addresses
 +
|-
 +
| 52
 +
| 5 x 4 = 20
 +
|
 +
| Array of tail cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Transaction cache address <br> Value to indicate an in-flight operation
 +
|-
 +
| 76
 +
| 4
 +
|
 +
| Operation <br> The in-flight operation
 +
|-
 +
| 80
 +
| 4
 +
|
 +
| Operations list <br> The in-flight operations list
 +
|-
 +
| 84
 +
| 7 x 4 = 28
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
Nitroba's IT department has received an email from Lily Tuckrige, a teacher in the Chemistry Department. Tuckrige has been receiving harassing emails and she suspects that they are being sent by a student in her class Chemistry 109, which she is teaching this summer.  The email was received at Tuckridge's personal email account, lilytuckrige@yahoo.com. She took a screenshot of the web browser and sent it in.
+
=== Array indexes ===
 +
The array indexes correspond to the file types.
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| Separate file
 +
|-
 +
| 1
 +
| (Rankings) block data file
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|}
  
The system administrator who received the complaint wrote back to Tuckridge that Nitroba needed the full headers of the email message. Tuckridge responded by clicking the "Full message headers" button in Yahoo Mail and sent in another screen shot, this one with mail headers.
+
=== Index table ===
 +
The index table is an array of cache addresses.
  
The mail header shows that the mail message originated from the IP address 140.247.62.34, which is a Nitroba student dorm room. Three women share the dorm room. Nitroba provides an Ethernet connection in every dorm room but not Wi-Fi access, so one of the women's friends installed a Wi-Fi router in the room. There is no password on the Wi-Fi.
+
== Data block file format (data_#) ==
 +
Overview:
 +
* File header
 +
* array of blocks
  
Because several email messages appear to come from the IP address, Nitroba decides to place a network sniffer on the ethernet port. All of the packets are logged. On Monday 7/21 Tuckridge received another harassing email. But this time instead of receiving it directly, the perpetrator sent it through a web-based service called
+
=== File header ===
"willselfdestruct.com." The website briefly shows the message to Tuckridge, and then the website reports that the "Message Has Been Destroyed."
+
The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x04\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 2
 +
|
 +
| File number (or file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 10
 +
| 2
 +
|
 +
| Next file number (or next file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Block size (or cache entry) size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Maximum number of entries
 +
|-
 +
| 24
 +
| 4 x 4 = 16
 +
|
 +
| Array of empty entry counters <br> The counters of empty entries for each type
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of last used position (hints) <br> The last used position for each type
 +
|-
 +
| 56
 +
| 4
 +
|
 +
| Updating <br> Value to keep track of updates to the header
 +
|-
 +
| 60
 +
| 5 x 4 = 20
 +
|
 +
| User
 +
|-
 +
| 80
 +
| 2028 x 4 = 8112
 +
|
 +
| Block allocation bitmap
 +
|}
  
Students are provided with the screen shots, the packets that were collected from the Ethernet tap, and the Chem 109 roster. Their job is to determine if one of the students in the class was responsible for the harassing email and to provide clear, conclusive evidence to support your conclusion.
+
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
  
'''3. M57 Jean'''
+
=== Cache entry ===
 +
The cache entry (struct EntryStore) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
|
 +
| Hash <br> The hash of the key
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Next entry cache address <br> The next entry with the same hash or bucket
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Rankings node cache address
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Reuse count <br> Value that indicates how often this entry was (re-)used
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Refetch count <br> Value that indicates how often this entry was (re-)fetched (or downloaded)
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Cache entry state
 +
|-
 +
| 24
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Key data size
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Long key data cache address <br> The value is 0 if no long key data is present
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream sizes
 +
|-
 +
| 56
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Cache entry flags
 +
|-
 +
| 76
 +
| 4 x 4 = 16
 +
|
 +
| Padding
 +
|-
 +
| 92
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 92 bytes of the cache entry is this used as a checksum?
 +
|-
 +
| 96
 +
| 160
 +
|
 +
| Key data <br> Contains an UTF-8 encoded string with an end-of-string character with the original URL
 +
|}
  
The M57-Jean scenario is a single disk image scenario involving the exfiltration of corporate documents from the laptop of a senior executive. The scenario involves a small start-up company, M57.Biz. A few weeks into inception a confidential spreadsheet that contains the names and salaries of the company’s key employees was found posted to the “comments” section of one of the firm’s competitors. The spreadsheet only existed on one of M57′s officers—Jean.
+
==== Array indexes ====
Jean says that she has no idea how the data left her laptop and that she must have been hacked.
+
The data stream array indexes correspond to:
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| HTTP response headers
 +
|-
 +
| 1
 +
| Page contents (Payload)
 +
|-
 +
| 2
 +
|
 +
|-
 +
| 3
 +
|
 +
|}
  
Students are given a disk image of Jean’s laptop. Their job is to figure out how the data was stolen—or if Jean isn’t as innocent as she claims.
+
==== Cache entry state ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0
 +
| ENTRY_NORMAL
 +
| Normal cache entry
 +
|-
 +
| 1
 +
| ENTRY_EVICTED
 +
| The cache entry was recently evicted
 +
|-
 +
| 2
 +
| ENTRY_DOOMED
 +
| The cache entry was doomed
 +
|}
  
'''RESEARCH DATA SETS'''
+
==== Cache entry flags ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0x00000001
 +
| PARENT_ENTRY
 +
| Parent cache entry
 +
|-
 +
| 0x00000002
 +
| CHILD_ENTRY
 +
| Child cache entry
 +
|}
  
We are also making available an enlarged "research data set" which contains a wealth of information that can be used by students interested in RAM, Network, or Disk Forensics.
+
=== Rankings node ===
 +
The rankings node (struct RankingsNode) is 36 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 8
 +
|
 +
| Last used <br> Contains LRU information?
 +
|-
 +
| 8
 +
| 8
 +
|
 +
| Last modified <br> Contains LRU information?
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Next rankings node cache address
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Previous rankings node cache address
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Cache entry cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Is dirty flag
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 32 bytes of the rankings node is this used as a checksum?
 +
|}
  
The research data set was created at the same time as the 2009-M57 Patents dataset but contains substantially more information:
+
== Data stream ==
  * All of the IP packets in and out of the M57 test network.
+
The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a [[gzip|gzip file]].
  * Daily disk images and RAM captures of each computer on the network.
+
  
This data is not needed to "solve" the scenario, but it might be interesting for students that are:
+
Note that the last modification time of the gzip file is set to 0.
  
  * Interested in learning about RAM analysis and needs a source of RAM images.
+
== Hash ==
  * Interested in network forensics and wants packets.
+
The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:
  * Interested in writing software that does "disk differencing" or can detect the installation of malware.
+
<pre>
  * Wants examples of how a Windows registry is modified over time with use.
+
uint32_t SuperFastHash(
 +
          const uint8_t *key,
 +
          size_t key_size )
 +
{
 +
    size_t key_index    = 0;
 +
    size_t remainder    = 0;
 +
    uint32_t hash_value = 0;
 +
    uint32_t temp_value = 0;
  
'''OBTAINING THE DATA'''
+
    if( ( key == NULL ) || ( key_siz 0 ) )
 +
    {
 +
      return( 0 );
 +
    }
 +
    remainder = key_size % 4;
 +
    key_size -= remainder;
  
You can obtain our data at the following addresses:
+
    for( key_index = 0;
 +
        key_index < key_size;
 +
        key_index += 4 )
 +
    {
 +
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );
  
The M57 Corpus:
+
        temp_value = ( temp_value << 11 ) ^ hash_value;
  * http://torrent.ibiblio.org/doc/187/torrents  (bit torrent form)
+
  * http://domex.nps.edu/corp/scenarios/2009-m57/  (individual files)
+
  
Please download from the iBiblio bittorrent server if possible. There are a number of torrents available for your convenience. If you examine the manifests, you will notice that the files overlap (some disk images appearing in more than one torrent).
+
        hash_value  = ( hash_value << 16 ) ^ temp_value;
 +
        hash_value += hash_value >> 11;
 +
    }
  
Each torrent will place files into:
+
    switch( remainder )
 +
    {
 +
        case 3:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value<< 16;
 +
            hash_value ^= key[ key_index + 2 ] << 18;
 +
            hash_value += hash_value >> 11;
 +
            break;
  
          [YOUR_LOCAL_DIRECTORY]/torrent_name/corp/scenarios/2009_m57/
+
        case 2:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value << 11;
 +
            hash_value += hash_value >> 17;
 +
            break;
  
Please seed if possible! The "police materials" torrent references only those materials that would be captured in a raid (e.g. the final day of the scenario).
+
        case 1:
 +
            hash_value += key[ key_index ];
 +
            hash_value ^= hash_value << 10;
 +
            hash_value += hash_value >> 1;
 +
            break;
 +
    }
  
 +
    /* Force "avalanching" of final 127 bits */
 +
    hash_value ^= hash_value << 3;
 +
    hash_value += hash_value >> 5;
 +
    hash_value ^= hash_value << 4;
 +
    hash_value += hash_value >> 17;
 +
    hash_value ^= hash_value << 25;
 +
    hash_value += hash_value >> 6;
  
The 2008-Nitroba corpus:
+
    return hash_value;
  * http://domex.nps.edu/corp/scenarios/2008-nitroba/
+
}
 +
</pre>
 +
 
 +
== See Also ==
 +
* [[Google Chrome]]
 +
* [[gzip]]
 +
 
 +
== External Links ==
 +
* [http://www.chromium.org/developers/design-documents/network-stack/disk-cache Disk Cache], The Chromium Projects
 +
 
 +
[[Category:File Formats]]

Revision as of 05:17, 22 June 2014

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Cache files

The cache is stored in multiple:

Filename Description
index The index file
data_# Data block files
f_###### (Separate) data stream file

Cache address

The cache address is 4 bytes in size and consists of:

offset size value description
If file type is 0 (Separate file)
0.0 28 bits File number
The value represents the value of # in f_######
Else
0.0 16 bits Block number
2.0 8 bits File number (or file selector)
The value represents the value of # in data_#
3.0 2 bits Block size
The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
3.2 2 bits Reserved
Common
3.4 3 bits File type
3.7 1 bit Initialized flag

File types

Value Description
0 (Separate) data stream file
1 (Rankings) block data file (36 byte block data file)
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file
6 Unknown; seen on Mac OS X 0x6f430074

Examples

Value Description
0x00000000 Not initialized
0x8000002a Data stream file: f_00002a
0xa0010003 Block data file: data_1, block number 3, 1 block of size

Index file format (index)

Overview:

  • File header
  • least recently used (LRU) data (or eviction control data)
  • index table

File header

The index file header (struct IndexHeader) is 256 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x03\xc1" Signature
4 2 Minor version
6 2 Major version
8 4 Number of entries
12 4 Stored data size
16 4 Last created file number
The value represents the value of # in f_######
20 4 Dirty flag
Identifier for all entries being changed
24 4 Usage statistics data cache address
28 4 Table size
Where 0 represents 0x10000 (is this the same as the file size?)
32 4 Signals a previous crash
36 4 Identifier of an ongoing test or experiment
40 8 Creation time
48 52 x 8 = 208 Padding
Contains 0-byte values

Format versions

Value Description
2.0
2.1

Least recently used (LRU) data

The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:

offset size value description
0 2 x 4 = 8 Padding
8 4 Filled flag
Value to indicate if when the cache was filled
12 5 x 4 = 20 Array of sizes
32 5 x 4 = 20 Array of head cache addresses
52 5 x 4 = 20 Array of tail cache addresses
72 4 Transaction cache address
Value to indicate an in-flight operation
76 4 Operation
The in-flight operation
80 4 Operations list
The in-flight operations list
84 7 x 4 = 28 Padding
Contains 0-byte values

Array indexes

The array indexes correspond to the file types.

Value Description
0 Separate file
1 (Rankings) block data file
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file

Index table

The index table is an array of cache addresses.

Data block file format (data_#)

Overview:

  • File header
  • array of blocks

File header

The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x04\xc1" Signature
4 2 Minor version
6 2 Major version
8 2 File number (or file index)
The value represents the value of # in data_#
10 2 Next file number (or next file index)
The value represents the value of # in data_#
12 4 Block size (or cache entry) size
16 4 Number of entries
20 4 Maximum number of entries
24 4 x 4 = 16 Array of empty entry counters
The counters of empty entries for each type
40 4 x 4 = 16 Array of last used position (hints)
The last used position for each type
56 4 Updating
Value to keep track of updates to the header
60 5 x 4 = 20 User
80 2028 x 4 = 8112 Block allocation bitmap

Format versions

Value Description
2.0
2.1

Cache entry

The cache entry (struct EntryStore) is 256 bytes in size and consists of:

offset size value description
0 4 Hash
The hash of the key
4 4 Next entry cache address
The next entry with the same hash or bucket
8 4 Rankings node cache address
12 4 Reuse count
Value that indicates how often this entry was (re-)used
16 4 Refetch count
Value that indicates how often this entry was (re-)fetched (or downloaded)
20 4 Cache entry state
24 8 Creation time
32 4 Key data size
36 4 Long key data cache address
The value is 0 if no long key data is present
40 4 x 4 = 16 Array of data stream sizes
56 4 x 4 = 16 Array of data stream cache addresses
72 4 Cache entry flags
76 4 x 4 = 16 Padding
92 4 Self hash
The hash of the first 92 bytes of the cache entry is this used as a checksum?
96 160 Key data
Contains an UTF-8 encoded string with an end-of-string character with the original URL

Array indexes

The data stream array indexes correspond to:

Value Description
0 HTTP response headers
1 Page contents (Payload)
2
3

Cache entry state

Value Identifier Description
0 ENTRY_NORMAL Normal cache entry
1 ENTRY_EVICTED The cache entry was recently evicted
2 ENTRY_DOOMED The cache entry was doomed

Cache entry flags

Value Identifier Description
0x00000001 PARENT_ENTRY Parent cache entry
0x00000002 CHILD_ENTRY Child cache entry

Rankings node

The rankings node (struct RankingsNode) is 36 bytes in size and consists of:

offset size value description
0 8 Last used
Contains LRU information?
8 8 Last modified
Contains LRU information?
16 4 Next rankings node cache address
20 4 Previous rankings node cache address
24 4 Cache entry cache address
28 4 Is dirty flag
32 4 Self hash
The hash of the first 32 bytes of the rankings node is this used as a checksum?

Data stream

The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a gzip file.

Note that the last modification time of the gzip file is set to 0.

Hash

The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:

uint32_t SuperFastHash(
          const uint8_t *key,
          size_t key_size )
{ 
    size_t key_index    = 0;
    size_t remainder    = 0;
    uint32_t hash_value = 0;
    uint32_t temp_value = 0;

    if( ( key == NULL ) || ( key_siz 0 ) )
    {
       return( 0 );
    }
    remainder = key_size % 4;
    key_size -= remainder;

    for( key_index = 0;
         key_index < key_size;
         key_index += 4 )
    {
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );

        temp_value = ( temp_value << 11 ) ^ hash_value;

        hash_value  = ( hash_value << 16 ) ^ temp_value;
        hash_value += hash_value >> 11;
    }

    switch( remainder )
    {
        case 3:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value<< 16;
            hash_value ^= key[ key_index + 2 ] << 18;
            hash_value += hash_value >> 11;
            break;

        case 2:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value << 11;
            hash_value += hash_value >> 17;
            break;

        case 1:
            hash_value += key[ key_index ];
            hash_value ^= hash_value << 10;
            hash_value += hash_value >> 1;
            break;
    }

    /* Force "avalanching" of final 127 bits */
    hash_value ^= hash_value << 3;
    hash_value += hash_value >> 5;
    hash_value ^= hash_value << 4;
    hash_value += hash_value >> 17;
    hash_value ^= hash_value << 25;
    hash_value += hash_value >> 6;

    return hash_value;
}

See Also

External Links