Difference between pages "Open Computer Forensics Architecture" and "Research Topics"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
 
(SleuthKit Enhancements)
 
Line 1: Line 1:
{{Infobox_Software |
+
Interested in doing research in computer forensics? Looking for a master's topic, or just some ideas for a research paper? Here is our list. Please feel free to add your own ideas.
  name = OCFA |
+
  maintainer = [[KLPD]] |
+
  os = [[Linux]], [[FreeBSD]] |
+
  genre = {{Analysis}} |
+
  license = {{GPL}}, {{LGPL}} |
+
  website = [http://sourceforge.net/projects/ocfa/ sourceforge.net/projects/ocfa/] |
+
}}
+
  
The '''Open Computer Forensics Architecture''' ('''OCFA''') is a modular [[computer forensics framework]] built by the [[KLPD|Dutch National Police Agency]]. The main goal is to automate the digital forensic process to speed up the investigation and give tactical [[investigator]]s direct access to the seized data through an easy to use search and browse interface.
+
Many of these would make a nice master's project.
  
The architecture forms an environment where existing forensic [[tools]] and libraries can be easily plugged into the architecture and can thus be made part of the recursive extraction of data and [[metadata]] from digital evidence.
+
==Small Programming Projects==
 +
* Modify [[bulk_extractor]] so that it can directly acquire a raw device under Windows. This requires replacing the current ''open'' function call with a ''CreateFile'' function call and using windows file handles.
 +
* Create a program that visualizes the contents of a file, sort of like hexedit, but with other features:
 +
** Automatically pull out the strings
 +
** Show histogram
 +
** Detect crypto and/or stenography.
 +
** (I would write the program in java with a plug-in architecture)
 +
* Extend [[fiwalk]] to report the NTFS "inodes."
 +
==Big Programming Projects==
 +
* Write [[Carver 2.0 Planning Page | Carver 2.0]]
 +
* Create a method to detect NTFS-compressed cluster blocks on a disk (RAW data stream). A method could be to write a generic signature to detect the beginning of NTFS-compressed file segments on a disk. This method is useful in carving and scanning for textual strings.
  
The Open Computer Forensics Architecture aims to be highly modular, robust, fault tolerant, recursive and scalable in order to be usable in large investigations that spawn numerous terabytes of evidence data and covers hundreds of evidence items.
+
==Reverse-Engineering Projects==
 +
* Continue work on the [[Extensible Storage Engine (ESE) Database File (EDB) format]] in regard to
 +
** Fill in the missing information about older ESE databases
 +
** Exchange EDB (MAPI database), STM
 +
** Active Directory (Active Directory working document available on request)
 +
* Continue work on the [[Notes Storage Facility (NSF)]]
 +
* Microsoft SQL Server databases
  
Modules in OCFA for reasons of fault tolerance are processes. The basic [[OcfaLib API]] makes it possible and relatively easy to build an OCFA module out of any data processing library or tool. OCFA comes with numerous such modules that are mostly wrappers around libraries like [[libmagic]] or tools such as those found in the [[Sleuthkit]].
+
* Physical layer access to flash storage.
 +
** Gain access to the physical layer of SD or USB flash storage device. This will require reverse-engineering the proprietary APIs or gaining access to proprietary information from the manufacturers. Use these APIs to demonstrate the feasibility of recovering residual data that has been overwritten at the logical layer but which is still present at the physical layer.
  
The 2.2 version of OCFA (released April 2009) makes the previously internal [[OCFA treegraph API]] available for OCFA module development. The OCFA treegraph API allows more advanced dissectors that produce data and meta-data for a treegraph representation of an input file. The OCFA treegraph API also allows dissectors that are programed to be [[CarvFs]] aware to use [[zero storage carving]].  
+
==SleuthKit Enhancements==
 +
[[SleuthKit]] is the popular open-source system for forensics and data recovery.
 +
* Add support for a new file system:
 +
** The [[YAFFS]] [[flash file system]]. (YAFFS2 is currently used on the Google G1 phone.) (ViaForensics is currently working on this)
 +
** The [[JFFS2]] [[flash file system]]. (JFFS2 is currently used on the One Laptop Per Child laptop.)
 +
** [[XFAT]], Microsoft's new FAT file system.
 +
** [[EXT4]] (JHUAPL is currently working on this); Also see: http://www.williballenthin.com/ext4/
 +
** [[Resilient File System (ReFS)|ReFS]]
 +
* Enhance support for an existing file system:
 +
** Report the physical location on disk of compressed files.
 +
** Add support for NTFS encrypted files (EFS)
 +
** Extend SleuthKit's implementation of NTFS to cover Transaction NTFS (TxF) (see [[NTFS]])
 +
* Write a FUSE-based mounter for SleuthKit, so that disk images can be forensically mounted using TSK.
 +
* Rewrite '''sorter''' in C++ to make it faster and more flexible.
  
Communication between modules within OCFA is governed by a two layered communication infrastructure as provided by OCFA. At the lowest layer is a messaging system with at is center the OCFA Anycast Relay. The Anycast Relay provides the facilities of module crash resistance, distributed processing load balancing and flow control.
+
==EnCase Enhancement==
At a higher level of communication, the OCFA XML Router provides for the routing of individual pieces of evidence through the most appropriate tool chain for its particular type of content.  
+
* Develop an EnScript that allows you to script EnCase from Python. (You can do this because EnScripts can run arbitrary DLLs. The EnScript calls the DLL. Each "return" from the DLL is a specific EnCase command to execute. The EnScript then re-enters the DLL.)
  
Although OCFA contains a rudimentary user interface, most of its power is in the backend architecture.
+
==Timeline Analysis==
The last and final module in the tool chain of any evidence will be the OCFA Data Store Module. This module
+
; Timeline Visualization and Analysis
processes the evidence XML (that contains all of the evidence data its meta data) and stores relevant parts into a postgesql database. Extending the apache based user interface with interfaces for your own case bound queries
+
: Write a new timeline viewer that supports Logfile fusion (with offsets) and provides the ability to view the logfile in the frequency domain.
is something that should proof very useful in most investigations.
+
  
For more information consult the project site.
+
==Research Areas==
 +
These are research areas that could easily grow into a PhD thesis.
 +
; Stream-based Forensics
 +
: Process the entire disk with one pass to minimize seek time. (You may find it necessary to do a quick metadata scan first.)
 +
; Stegnography Detection (general purpose)
 +
: Detect the use of stegnography by through the analysis of file examplars and specifications.
 +
; Sanitization Detection
 +
: Detect and diagnose sanitization attempts.
 +
; Compressed Data Reconstruction
 +
: Reconstruct decompressed data from a GZIP file after the first 1K has been removed.
 +
;Evidence Falsification Detection
 +
: Automatically detect falsified digital evidence through the use of inconsistency in file system allocations, application data allocation, and log file analysis.
 +
; Visualization of data/information in digital forensic context
 +
: SWOT of current visualization techniques in forensic tools; improvements; feasibility of 3D representation;
  
== External Links ==
+
==Correlation==
* [http://sourceforge.net/projects/ocfa/ Project site]
+
* Logfile correlation
 +
* Document identity identification
 +
* Correlation between stored data and intercept data
 +
* Online Social Network Analysis
 +
** Find and download in a forensically secure manner all of the information in a social network (e.g. Facebook, LinkedIn, etc.) associated with a targeted individual.
 +
** Determine who is searching for a targeted individual. This might be done with a honeypot, or documents with a tracking device in them, or some kind of covert Facebook App.
 +
* Automated grouping/annotation of low-level events, e.g. access-time, log-file entry, to higher-level events, e.g. program start, login
 +
 
 +
 
 +
__NOTOC__

Revision as of 00:23, 17 May 2012

Interested in doing research in computer forensics? Looking for a master's topic, or just some ideas for a research paper? Here is our list. Please feel free to add your own ideas.

Many of these would make a nice master's project.

Small Programming Projects

  • Modify bulk_extractor so that it can directly acquire a raw device under Windows. This requires replacing the current open function call with a CreateFile function call and using windows file handles.
  • Create a program that visualizes the contents of a file, sort of like hexedit, but with other features:
    • Automatically pull out the strings
    • Show histogram
    • Detect crypto and/or stenography.
    • (I would write the program in java with a plug-in architecture)
  • Extend fiwalk to report the NTFS "inodes."

Big Programming Projects

  • Write Carver 2.0
  • Create a method to detect NTFS-compressed cluster blocks on a disk (RAW data stream). A method could be to write a generic signature to detect the beginning of NTFS-compressed file segments on a disk. This method is useful in carving and scanning for textual strings.

Reverse-Engineering Projects

  • Physical layer access to flash storage.
    • Gain access to the physical layer of SD or USB flash storage device. This will require reverse-engineering the proprietary APIs or gaining access to proprietary information from the manufacturers. Use these APIs to demonstrate the feasibility of recovering residual data that has been overwritten at the logical layer but which is still present at the physical layer.

SleuthKit Enhancements

SleuthKit is the popular open-source system for forensics and data recovery.

  • Add support for a new file system:
  • Enhance support for an existing file system:
    • Report the physical location on disk of compressed files.
    • Add support for NTFS encrypted files (EFS)
    • Extend SleuthKit's implementation of NTFS to cover Transaction NTFS (TxF) (see NTFS)
  • Write a FUSE-based mounter for SleuthKit, so that disk images can be forensically mounted using TSK.
  • Rewrite sorter in C++ to make it faster and more flexible.

EnCase Enhancement

  • Develop an EnScript that allows you to script EnCase from Python. (You can do this because EnScripts can run arbitrary DLLs. The EnScript calls the DLL. Each "return" from the DLL is a specific EnCase command to execute. The EnScript then re-enters the DLL.)

Timeline Analysis

Timeline Visualization and Analysis
Write a new timeline viewer that supports Logfile fusion (with offsets) and provides the ability to view the logfile in the frequency domain.

Research Areas

These are research areas that could easily grow into a PhD thesis.

Stream-based Forensics
Process the entire disk with one pass to minimize seek time. (You may find it necessary to do a quick metadata scan first.)
Stegnography Detection (general purpose)
Detect the use of stegnography by through the analysis of file examplars and specifications.
Sanitization Detection
Detect and diagnose sanitization attempts.
Compressed Data Reconstruction
Reconstruct decompressed data from a GZIP file after the first 1K has been removed.
Evidence Falsification Detection
Automatically detect falsified digital evidence through the use of inconsistency in file system allocations, application data allocation, and log file analysis.
Visualization of data/information in digital forensic context
SWOT of current visualization techniques in forensic tools; improvements; feasibility of 3D representation;

Correlation

  • Logfile correlation
  • Document identity identification
  • Correlation between stored data and intercept data
  • Online Social Network Analysis
    • Find and download in a forensically secure manner all of the information in a social network (e.g. Facebook, LinkedIn, etc.) associated with a targeted individual.
    • Determine who is searching for a targeted individual. This might be done with a honeypot, or documents with a tracking device in them, or some kind of covert Facebook App.
  • Automated grouping/annotation of low-level events, e.g. access-time, log-file entry, to higher-level events, e.g. program start, login