National Software Reference Library
The National Software Reference Library (NSRL) is the National Institute of Standards and Technology's National Software Reference Library. The NSRL is a physical resource located in Bethesda Maryland. The NSRL consists of more than 21,000 individual software packages. NIST has the original packaging and distribution media for all.
RDS is the Reference Data Set. The RDS consists of the hash codes of the files in the NSRL. Originally it was created by installing the software on systems and then generating a list of the hash codes. These days it is created by extracting the installation files from the Microsoft .CAB, .MSI and .ZIP files.
The RDS is typically used for data reduction. That is, the set of hash codes is used as a filter to eliminate or highlight files from examination. Most frequently the RDS used as a list of known goods that can be safely suppressed. This is an incorrect usage of the RDS and should be discouraged, because the RDS does not indicate if a file is known good or bad, only that it is known. Indeed, the RDS has many files that were once throught to be good, but are now thought to be bad---for example, versions of Adobe Flash with known security vulnerabilities.
The NSRL is distributed online can be downloaded from the NSRL website. The most recent release was version 2.24 in March 2009.
RDS File Format
Each RDS consists of several files, but the hashes are stored in NSRLFile.txt. These files have a header followed by many hash records. The header denotes the columns in each file. (See the External Links for the complete specification). RDS files can be used directly with programs like md5deep, FTK, and EnCase.
The file format has changed slightly over time. The latest version was dated 7 Feb 2007:
Starting in version 2.0, the NSRL moved the hashes to the start of each line and dropped the MD4 hash. The file header:
Information on the older header version is kept here so that programs can read older files. The file header:
OpSystemCode refers to the operating system code. The SpecialCode is a single character that can be used to mark records. A normal file has a blank value here. An M in this field denotes a malicious file.