Difference between pages "Forensic Live CD issues" and "Chrome Disk Cache Format"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(Problems)
 
(Data stream)
 
Line 1: Line 1:
== The problem ==
+
{{expand}}
  
[[Live CD|Forensic Live CDs]] are widely used during computer forensic investigations. Currently, many vendors of such Live CD distributions spread false claims that their distributions "do not touch anything", "write protect everything" and so on. Unfortunately, community-developed distributions are no exception here. Finally, it turns out that many Linux-based forensic Live CDs are not tested properly and there are no suitable test cases published.
+
== Cache files ==
 +
The cache is stored in multiple:
 +
{| class="wikitable"
 +
|-
 +
! Filename
 +
! Description
 +
|-
 +
| index
 +
| The index file
 +
|-
 +
| data_#
 +
| Data block files
 +
|-
 +
| f_######
 +
| (Separate) data stream file
 +
|}
  
== Another side of the problem ==
+
== Cache address ==
 +
The cache address is 4 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| <i>If file type is 0 (Separate file)</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 28 bits
 +
|
 +
| File number <br> The value represents the value of # in f_######
 +
|-
 +
| <i>Else</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 0.0
 +
| 16 bits
 +
|
 +
| Block number
 +
|-
 +
| 2.0
 +
| 8 bits
 +
|
 +
| File number (or file selector) <br> The value represents the value of # in data_#
 +
|-
 +
| 3.0
 +
| 2 bits
 +
|
 +
| Block size <br> The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
 +
|-
 +
| 3.2
 +
| 2 bits
 +
|
 +
| Reserved
 +
|-
 +
| <i>Common</i>
 +
|
 +
|
 +
|
 +
|-
 +
| 3.4
 +
| 3 bits
 +
|
 +
| File type
 +
|-
 +
| 3.7
 +
| 1 bit
 +
|
 +
| Initialized flag
 +
|}
  
Another side of the problem of insufficient testing of forensic Live CDs is that many users do not know what happens "under the hood" of the provided operating system and cannot adequately test them.
+
=== File types ===
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| (Separate) data stream file
 +
|-
 +
| 1
 +
| (Rankings) block data file (36 byte block data file)
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|-
 +
|
 +
|
 +
|-
 +
| 6
 +
| Unknown; seen on Mac OS  X 0x6f430074
 +
|}
  
=== Example ===
+
==== Examples ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0x00000000
 +
| Not initialized
 +
|-
 +
| 0x8000002a
 +
| Data stream file: f_00002a
 +
|-
 +
| 0xa0010003
 +
| Block data file: data_1, block number 3, 1 block of size
 +
|}
  
For example, [http://forensiccop.blogspot.com/2009/10/forensic-cop-journal-13-2009.html ''Forensic Cop Journal'' (Volume 1(3), Oct 2009)] describes a test case when an Ext3 file system was mounted using "-o ro" mount flag as a way to write protect the data. The article says that all tests were successful (i.e. no data modification was found after unmounting the file system), but it is known that damaged (i.e not properly unmounted) Ext3 file systems cannot be write protected using only "-o ro" mount flags (write access will be enabled during file system recovery).
+
== Index file format (index) ==
 +
Overview:
 +
* File header
 +
* least recently used (LRU) data (or eviction control data)
 +
* index table
  
And the question is: will many users test damaged Ext3 file system (together with testing the clean one) when validating their favourite forensic Live CD distribution? My answer is "no", because many users are unaware of such traits.
+
=== File header ===
 +
The index file header (struct IndexHeader) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x03\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Stored data size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Last created file number <br> The value represents the value of # in f_######
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Dirty flag <br> Identifier for all entries being changed
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Usage statistics data cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Table size <br> Where 0 represents 0x10000 (is this the same as the file size?)
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Signals a previous crash
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Identifier of an ongoing test or experiment
 +
|-
 +
| 40
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 48
 +
| 52 x 8 = 208
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
== Problems ==
+
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
|2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
  
Each problem is followed by a list of distributions affected (currently this list is not up-to-date).
+
=== Least recently used (LRU) data ===
 +
The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 2 x 4 = 8
 +
|
 +
| Padding
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Filled flag <br> Value to indicate if when the cache was filled
 +
|-
 +
| 12
 +
| 5 x 4 = 20
 +
|
 +
| Array of sizes
 +
|-
 +
| 32
 +
| 5 x 4 = 20
 +
|
 +
| Array of head cache addresses
 +
|-
 +
| 52
 +
| 5 x 4 = 20
 +
|
 +
| Array of tail cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Transaction cache address <br> Value to indicate an in-flight operation
 +
|-
 +
| 76
 +
| 4
 +
|
 +
| Operation <br> The in-flight operation
 +
|-
 +
| 80
 +
| 4
 +
|
 +
| Operations list <br> The in-flight operations list
 +
|-
 +
| 84
 +
| 7 x 4 = 28
 +
|
 +
| Padding <br> Contains 0-byte values
 +
|}
  
=== Journaling file system updates ===
+
=== Array indexes ===
 +
The array indexes correspond to the file types.
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| Separate file
 +
|-
 +
| 1
 +
| (Rankings) block data file
 +
|-
 +
| 2
 +
| 256 byte block data file
 +
|-
 +
| 3
 +
| 1024 byte block data file
 +
|-
 +
| 4
 +
| 4096 byte block data file
 +
|}
  
When mounting (and unmounting) several journaling file systems with only "-o ro" mount flag a different number of data writes may occur. Here is a list of such file systems:
+
=== Index table ===
 +
The index table is an array of cache addresses.
  
{| class="wikitable" border="1"
+
== Data block file format (data_#) ==
|-
+
Overview:
!  File system
+
* File header
!  When data writes happen
+
* array of blocks
!  Notes
+
|-
+
|  Ext3
+
|  File system requires journal recovery
+
|  To disable recovery: use "noload" flag, or use "ro,loop" flags, or use "ext2" file system type
+
|-
+
|  Ext4
+
|  File system requires journal recovery
+
|  To disable recovery: use "noload" flag, or use "ro,loop" flags, or use "ext2" file system type
+
|-
+
|  ReiserFS
+
File system has unfinished transactions
+
|  "nolog" flag does not work (see ''man mount''). To disable journal updates: use "ro,loop" flags
+
|-
+
|  XFS
+
|  Always (when unmounting)
+
|  "norecovery" flag does not help (fixed in recent 2.6 kernels). To disable data writes: use "ro,loop" flags.
+
|}
+
  
Incorrect mount flags can be used to mount file systems on evidentiary media during the boot process or during the file system preview process. As described above, this may result in data writes to evidentiary media. For example, several Ubuntu-based forensic Live CD distributions mount and recover damaged Ext3/4 file systems on fixed media (e.g. hard drives) during execution of [http://en.wikipedia.org/wiki/Initrd ''initrd''] scripts (these scripts mount every supported file system type on every supported media type using only "-o ro" flag in order to find a root file system image).
+
=== File header ===
 +
The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
| "\xc3\xca\x04\xc1"
 +
| Signature
 +
|-
 +
| 4
 +
| 2
 +
|
 +
| Minor version
 +
|-
 +
| 6
 +
| 2
 +
|
 +
| Major version
 +
|-
 +
| 8
 +
| 2
 +
|
 +
| File number (or file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 10
 +
| 2
 +
|
 +
| Next file number (or next file index) <br> The value represents the value of # in data_#
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Block size (or cache entry) size
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Number of entries
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Maximum number of entries
 +
|-
 +
| 24
 +
| 4 x 4 = 16
 +
|
 +
| Array of empty entry counters <br> The counters of empty entries for each type
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of last used position (hints) <br> The last used position for each type
 +
|-
 +
| 56
 +
| 4
 +
|
 +
| Updating <br> Value to keep track of updates to the header
 +
|-
 +
| 60
 +
| 5 x 4 = 20
 +
|
 +
| User
 +
|-
 +
| 80
 +
| 2028 x 4 = 8112
 +
|
 +
| Block allocation bitmap
 +
|}
  
[[Image:ext3 recovery.png|thumb|right|[[Helix3]]: damaged Ext3 recovery during the boot]]
+
==== Format versions ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Description
 +
|-
 +
| 2.0
 +
|
 +
|-
 +
| 2.1
 +
|
 +
|}
  
List of distributions that recover Ext3 (and sometimes Ext4) file systems during the boot:
+
=== Cache entry ===
 +
The cache entry (struct EntryStore) is 256 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 4
 +
|
 +
| Hash <br> The hash of the key
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Next entry cache address <br> The next entry with the same hash or bucket
 +
|-
 +
| 8
 +
| 4
 +
|
 +
| Rankings node cache address
 +
|-
 +
| 12
 +
| 4
 +
|
 +
| Reuse count <br> Value that indicates how often this entry was (re-)used
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Refetch count <br> Value that indicates how often this entry was (re-)fetched (or downloaded)
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Cache entry state
 +
|-
 +
| 24
 +
| 8
 +
|
 +
| Creation time
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Key data size
 +
|-
 +
| 36
 +
| 4
 +
|
 +
| Long key data cache address <br> The value is 0 if no long key data is present
 +
|-
 +
| 40
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream sizes
 +
|-
 +
| 56
 +
| 4 x 4 = 16
 +
|
 +
| Array of data stream cache addresses
 +
|-
 +
| 72
 +
| 4
 +
|
 +
| Cache entry flags
 +
|-
 +
| 76
 +
| 4 x 4 = 16
 +
|
 +
| Padding
 +
|-
 +
| 92
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 92 bytes of the cache entry is this used as a checksum?
 +
|-
 +
| 96
 +
| 160
 +
|
 +
| Key data <br> Contains an UTF-8 encoded string with an end-of-string character with the original URL
 +
|}
  
{| class="wikitable" border="1"
+
==== Array indexes ====
|-
+
The data stream array indexes correspond to:
! Distribution
+
{| class="wikitable"
! Version
+
|-
|-
+
! Value
| Helix3
+
! Description
| 2009R1
+
|-
|-
+
| 0
| SMART Linux (Ubuntu)
+
| HTTP response headers
|  2010-01-20
+
|-
|-
+
| 1
|  FCCU GNU/Linux Forensic Boot CD
+
| Page contents (Payload)
|  12.1
+
|-
|-
+
| 2
| SPADA
+
|  
| 4
+
|-
|-
+
| 3
| DEFT Linux
+
|  
| 7
+
|}
|}
+
  
=== Orphan inodes deletion ===
+
==== Cache entry state ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0
 +
| ENTRY_NORMAL
 +
| Normal cache entry
 +
|-
 +
| 1
 +
| ENTRY_EVICTED
 +
| The cache entry was recently evicted
 +
|-
 +
| 2
 +
| ENTRY_DOOMED
 +
| The cache entry was doomed
 +
|}
  
When mounting Ext3/4 file systems all orphan inodes are removed, even if "-o ro" mount flag was specified. Currently, there is no specific mount flag to disable orphan inodes deletion. The only solution here is to use "-o ro,loop" flags.
+
==== Cache entry flags ====
 +
{| class="wikitable"
 +
|-
 +
! Value
 +
! Identifier
 +
! Description
 +
|-
 +
| 0x00000001
 +
| PARENT_ENTRY
 +
| Parent cache entry
 +
|-
 +
| 0x00000002
 +
| CHILD_ENTRY
 +
| Child cache entry
 +
|}
  
=== Root file system spoofing ===
+
=== Rankings node ===
 +
The rankings node (struct RankingsNode) is 36 bytes in size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! offset
 +
! size
 +
! value
 +
! description
 +
|-
 +
| 0
 +
| 8
 +
|
 +
| Last used <br> Contains LRU information?
 +
|-
 +
| 8
 +
| 8
 +
|
 +
| Last modified <br> Contains LRU information?
 +
|-
 +
| 16
 +
| 4
 +
|
 +
| Next rankings node cache address
 +
|-
 +
| 20
 +
| 4
 +
|
 +
| Previous rankings node cache address
 +
|-
 +
| 24
 +
| 4
 +
|
 +
| Cache entry cache address
 +
|-
 +
| 28
 +
| 4
 +
|
 +
| Is dirty flag
 +
|-
 +
| 32
 +
| 4
 +
|
 +
| Self hash <br> The hash of the first 32 bytes of the rankings node is this used as a checksum?
 +
|}
  
''See also: [[Early userspace | early userspace]]''
+
== Data stream ==
 +
The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a [[gzip|gzip file]].
  
Most Ubuntu-based forensic Live CD distributions use Casper (a set of scripts used to complete initialization process during early stage of boot). Casper is responsible for searching for a root file system (typically, an image of live environment) on all supported devices (because a bootloader does not pass any information about device used for booting to the kernel), mounting it and executing ''/sbin/init'' program on a mounted root file system that will continue the boot process. Unfortunately, Casper was not designed to meet computer forensics requirements and is responsible for damaged Ext3/4 file systems recovery during the boot (see above) and root file system spoofing.
+
Note that the last modification time of the gzip file is set to 0.
  
[[Image:Grml.png|thumb|right|[[grml]] mounted root file system from the [[hard drive]]]]
+
== Hash ==
 +
The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:
 +
<pre>
 +
uint32_t SuperFastHash(
 +
          const uint8_t *key,
 +
          size_t key_size )
 +
{
 +
    size_t key_index    = 0;
 +
    size_t remainder    = 0;
 +
    uint32_t hash_value = 0;
 +
    uint32_t temp_value = 0;
  
Currently, Casper may select fake root file system image on evidentiary media (e.g. [[Hard Drive|HDD]]), because there are no authenticity checks performed (except optional UUID check for a possible live file system), and this fake root file system image may be used to execute malicious code during the boot with root privileges. Knoppix-based forensic Live CD distributions are vulnerable to the same attack.
+
    if( ( key == NULL ) || ( key_siz 0 ) )
 +
    {
 +
      return( 0 );
 +
    }
 +
    remainder = key_size % 4;
 +
    key_size -= remainder;
  
List of Ubuntu-based distributions that allow root file system spoofing:
+
    for( key_index = 0;
 +
        key_index < key_size;
 +
        key_index += 4 )
 +
    {
 +
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );
  
{| class="wikitable" border="1"
+
        temp_value = ( temp_value << 11 ) ^ hash_value;
|-
+
!  Distribution
+
!  Version
+
|-
+
|  Helix3
+
|  2009R1
+
|-
+
|  Helix3 Pro
+
|  2009R3
+
|-
+
|  CAINE
+
|  1.5
+
|-
+
|  DEFT Linux
+
|  5
+
|-
+
|  Raptor
+
|  2.0
+
|-
+
|  BackTrack
+
|  4
+
|-
+
|  SMART Linux (Ubuntu)
+
|  2010-01-20
+
|-
+
|  FCCU GNU/Linux Forensic Boot CD
+
|  12.1
+
|}
+
  
Vulnerable Knoppix-based distributions include: SPADA, LinEn Boot CD, BitFlare.
+
        hash_value  = ( hash_value << 16 ) ^ temp_value;
 +
        hash_value += hash_value >> 11;
 +
    }
  
[http://anti-forensics.ru/ Anti-Forensics.Ru project] [http://digitalcorpora.org/corp/aor/drives/ released several ISO 9660 images] used to test various Linux Live CD distributions for root file system spoofing (description for all images is [http://anti-forensics.ru/casper/ here]).
+
    switch( remainder )
 +
    {
 +
        case 3:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value<< 16;
 +
            hash_value ^= key[ key_index + 2 ] << 18;
 +
            hash_value += hash_value >> 11;
 +
            break;
  
=== Swap space activation ===
+
        case 2:
 +
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
 +
            hash_value ^= hash_value << 11;
 +
            hash_value += hash_value >> 17;
 +
            break;
  
''Feel free to add information about swap space activation during the boot in some distributions''
+
        case 1:
 +
            hash_value += key[ key_index ];
 +
            hash_value ^= hash_value << 10;
 +
            hash_value += hash_value >> 1;
 +
            break;
 +
    }
  
=== Incorrect mount policy ===
+
    /* Force "avalanching" of final 127 bits */
 +
    hash_value ^= hash_value << 3;
 +
    hash_value += hash_value >> 5;
 +
    hash_value ^= hash_value << 4;
 +
    hash_value += hash_value >> 17;
 +
    hash_value ^= hash_value << 25;
 +
    hash_value += hash_value >> 6;
  
==== rebuildfstab and scanpartitions scripts ====
+
    return hash_value;
 +
}
 +
</pre>
  
Several forensic Linux Live CD distributions (Helix3 2009R1, Helix3 Pro 2009R3, old versions of CAINE, old versions of grml) use rebuildfstab and scanpartition scripts to create entries for attached devices in ''/etc/fstab''. Some versions of these scripts use wrong wildcards while searching for available block devices (''/dev/?d?'' instead of ''/dev/?d*''), this results in missing several "exotic" devices (like /dev/sdad, /dev/sdad1, etc) and in data writes when mounting them (because fstab lacks of read-only mount options for these devices).
+
== See Also ==
 +
* [[Google Chrome]]
 +
* [[gzip]]
  
=== Incorrect write-blocking approach ===
+
== External Links ==
 +
* [http://www.chromium.org/developers/design-documents/network-stack/disk-cache Disk Cache], The Chromium Projects
  
Some forensic Linux Live CD distributions rely on [[hdparm]] and [[blockdev]] programs to mount file systems in read-only mode (by setting the underlying block device to read-only mode). Unfortunately, setting a block device to read-only mode does not guarantee that [http://oss.sgi.com/archives/xfs/2009-07/msg00213.html no write commands will be passed to the drive]. There were several other bugs related to writing on a read-only device in the past (like [https://lkml.org/lkml/2007/2/6/1 Ext3/4 orphan inodes deletion]). At present, kernel code still disregards read-only mode set on block devices in many places (it should be noted that setting a block device to read-only mode will efficiently write-protect the drive from programs running in userspace, while kernel and its modules still can write anything to the block device, regardless of the read-only mode).
+
[[Category:File Formats]]
 
+
=== TRIM aka discard command ===
+
 
+
== External links ==
+
 
+
* [http://www.computer-forensics-lab.org/pdf/Linux_for_computer_forensic_investigators_2.pdf Linux for computer forensic investigators: problems of booting trusted operating system]
+
* [http://www.computer-forensics-lab.org/pdf/Linux_for_computer_forensic_investigators.pdf Linux for computer forensic investigators: «pitfalls» of mounting file systems]
+
 
+
[[Category:Live CD]]
+

Revision as of 05:17, 22 June 2014

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Cache files

The cache is stored in multiple:

Filename Description
index The index file
data_# Data block files
f_###### (Separate) data stream file

Cache address

The cache address is 4 bytes in size and consists of:

offset size value description
If file type is 0 (Separate file)
0.0 28 bits File number
The value represents the value of # in f_######
Else
0.0 16 bits Block number
2.0 8 bits File number (or file selector)
The value represents the value of # in data_#
3.0 2 bits Block size
The number of contiguous blocks where 0 represents 1 block and 3 represents 4 blocks.
3.2 2 bits Reserved
Common
3.4 3 bits File type
3.7 1 bit Initialized flag

File types

Value Description
0 (Separate) data stream file
1 (Rankings) block data file (36 byte block data file)
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file
6 Unknown; seen on Mac OS X 0x6f430074

Examples

Value Description
0x00000000 Not initialized
0x8000002a Data stream file: f_00002a
0xa0010003 Block data file: data_1, block number 3, 1 block of size

Index file format (index)

Overview:

  • File header
  • least recently used (LRU) data (or eviction control data)
  • index table

File header

The index file header (struct IndexHeader) is 256 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x03\xc1" Signature
4 2 Minor version
6 2 Major version
8 4 Number of entries
12 4 Stored data size
16 4 Last created file number
The value represents the value of # in f_######
20 4 Dirty flag
Identifier for all entries being changed
24 4 Usage statistics data cache address
28 4 Table size
Where 0 represents 0x10000 (is this the same as the file size?)
32 4 Signals a previous crash
36 4 Identifier of an ongoing test or experiment
40 8 Creation time
48 52 x 8 = 208 Padding
Contains 0-byte values

Format versions

Value Description
2.0
2.1

Least recently used (LRU) data

The least recently used (LRU) data (struct LruData) is 112 bytes in size and consists of:

offset size value description
0 2 x 4 = 8 Padding
8 4 Filled flag
Value to indicate if when the cache was filled
12 5 x 4 = 20 Array of sizes
32 5 x 4 = 20 Array of head cache addresses
52 5 x 4 = 20 Array of tail cache addresses
72 4 Transaction cache address
Value to indicate an in-flight operation
76 4 Operation
The in-flight operation
80 4 Operations list
The in-flight operations list
84 7 x 4 = 28 Padding
Contains 0-byte values

Array indexes

The array indexes correspond to the file types.

Value Description
0 Separate file
1 (Rankings) block data file
2 256 byte block data file
3 1024 byte block data file
4 4096 byte block data file

Index table

The index table is an array of cache addresses.

Data block file format (data_#)

Overview:

  • File header
  • array of blocks

File header

The index file header (struct BlockFileHeader) is 8192 bytes in size and consists of:

offset size value description
0 4 "\xc3\xca\x04\xc1" Signature
4 2 Minor version
6 2 Major version
8 2 File number (or file index)
The value represents the value of # in data_#
10 2 Next file number (or next file index)
The value represents the value of # in data_#
12 4 Block size (or cache entry) size
16 4 Number of entries
20 4 Maximum number of entries
24 4 x 4 = 16 Array of empty entry counters
The counters of empty entries for each type
40 4 x 4 = 16 Array of last used position (hints)
The last used position for each type
56 4 Updating
Value to keep track of updates to the header
60 5 x 4 = 20 User
80 2028 x 4 = 8112 Block allocation bitmap

Format versions

Value Description
2.0
2.1

Cache entry

The cache entry (struct EntryStore) is 256 bytes in size and consists of:

offset size value description
0 4 Hash
The hash of the key
4 4 Next entry cache address
The next entry with the same hash or bucket
8 4 Rankings node cache address
12 4 Reuse count
Value that indicates how often this entry was (re-)used
16 4 Refetch count
Value that indicates how often this entry was (re-)fetched (or downloaded)
20 4 Cache entry state
24 8 Creation time
32 4 Key data size
36 4 Long key data cache address
The value is 0 if no long key data is present
40 4 x 4 = 16 Array of data stream sizes
56 4 x 4 = 16 Array of data stream cache addresses
72 4 Cache entry flags
76 4 x 4 = 16 Padding
92 4 Self hash
The hash of the first 92 bytes of the cache entry is this used as a checksum?
96 160 Key data
Contains an UTF-8 encoded string with an end-of-string character with the original URL

Array indexes

The data stream array indexes correspond to:

Value Description
0 HTTP response headers
1 Page contents (Payload)
2
3

Cache entry state

Value Identifier Description
0 ENTRY_NORMAL Normal cache entry
1 ENTRY_EVICTED The cache entry was recently evicted
2 ENTRY_DOOMED The cache entry was doomed

Cache entry flags

Value Identifier Description
0x00000001 PARENT_ENTRY Parent cache entry
0x00000002 CHILD_ENTRY Child cache entry

Rankings node

The rankings node (struct RankingsNode) is 36 bytes in size and consists of:

offset size value description
0 8 Last used
Contains LRU information?
8 8 Last modified
Contains LRU information?
16 4 Next rankings node cache address
20 4 Previous rankings node cache address
24 4 Cache entry cache address
28 4 Is dirty flag
32 4 Self hash
The hash of the first 32 bytes of the rankings node is this used as a checksum?

Data stream

The data stream is stored inside the data block file (data_#), for small amounts of data, or stored as a separate file (f_######). The data stream is stored as a gzip file.

Note that the last modification time of the gzip file is set to 0.

Hash

The hash algorithm used is referred to as SuperFastHash. A pseudo C implementation:

uint32_t SuperFastHash(
          const uint8_t *key,
          size_t key_size )
{ 
    size_t key_index    = 0;
    size_t remainder    = 0;
    uint32_t hash_value = 0;
    uint32_t temp_value = 0;

    if( ( key == NULL ) || ( key_siz 0 ) )
    {
       return( 0 );
    }
    remainder = key_size % 4;
    key_size -= remainder;

    for( key_index = 0;
         key_index < key_size;
         key_index += 4 )
    {
        hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
        temp_value  = key[ key_index + 2 ] + ( key[ key_index + 3 ] << 8 );

        temp_value = ( temp_value << 11 ) ^ hash_value;

        hash_value  = ( hash_value << 16 ) ^ temp_value;
        hash_value += hash_value >> 11;
    }

    switch( remainder )
    {
        case 3:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value<< 16;
            hash_value ^= key[ key_index + 2 ] << 18;
            hash_value += hash_value >> 11;
            break;

        case 2:
            hash_value += key[ key_index ] + ( key[ key_index + 1 ] << 8 );
            hash_value ^= hash_value << 11;
            hash_value += hash_value >> 17;
            break;

        case 1:
            hash_value += key[ key_index ];
            hash_value ^= hash_value << 10;
            hash_value += hash_value >> 1;
            break;
    }

    /* Force "avalanching" of final 127 bits */
    hash_value ^= hash_value << 3;
    hash_value += hash_value >> 5;
    hash_value ^= hash_value << 4;
    hash_value += hash_value >> 17;
    hash_value ^= hash_value << 25;
    hash_value += hash_value >> 6;

    return hash_value;
}

See Also

External Links