Difference between revisions of "Research Topics"

From ForensicsWiki
Jump to: navigation, search
m (SleuthKit Enhancements)
(See Also)
 
(33 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
Interested in doing research in computer forensics? Looking for a master's topic, or just some ideas for a research paper? Here is our list. Please feel free to add your own ideas.
 
Interested in doing research in computer forensics? Looking for a master's topic, or just some ideas for a research paper? Here is our list. Please feel free to add your own ideas.
  
==Short-Term Engineering Projects==
+
Many of these would make a nice master's project.
These projects would make a nice master's thesis or the start of a PhD.
+
  
; Physical layer access to flash storage.
+
=Programming/Engineering Projects=
: Gain access to the physical layer of SD or USB flash storage device. This will require reverse-engineering the proprietary APIs or gaining access to proprietary information from the manufacturers. Use these APIs to demonstrate the feasibility of recovering residual data that has been overwritten at the logical layer but which is still present at the physical layer.
+
  
===SleuthKit Enhancements===
+
==Small-Sized Projects==
[[SleuthKit]] is the popular open-source system for forensics and data recovery.
+
; Sleuthkit:
* Add support for a new file system:
+
* Rewrite SleuthKit '''sorter''' in C++ to make it faster and more flexible.
** The [[YAFFS]] [[flash file system]]. (YAFFS2 is currently used on the Google G1 phone.)
+
; tcpflow:
** The [[JFFS2]] [[flash file system]]. (JFFS2 is currently used on the One Laptop Per Child laptop.)
+
* Modify [[tcpflow]]'s iptree.h implementation so that it only stores discriminating bit prefixes in the tree, similar to D. J. Bernstein's [http://cr.yp.to/critbit.html Crit-bit] trees.
** [[XFAT]], Microsoft's new FAT file system.
+
* Determine why [[tcpflow]]'s iptree.h implementation's ''prune'' works differently when caching is enabled then when it is disabled
** [[EXT4]]
+
 
* Enhance support for an existing file system:
+
==Medium-Sized Non-Programming Projects==
** Report the physical location on disk of compressed files.
+
===Digital Forensics Education===
** Add support for NTFS encrypted files (EFS)
+
* Survey existing DFE programs and DF practitioners regarding which tools they use. Report if the tools being taught are the same as the tools that are being used.
** Extend SleuthKit's implementation of NTFS to cover Transaction NTFS (TxF) (see [[NTFS]])
+
===Improving quality of forensic examination reports===
 +
* Defense asks you: "When did you update your antivirus program during the forensic examination?", what will you reply: date, or date/hour, or date/hour/minute? How many virus signatures can be added and then excluded as false positives in 24 hours? Does mirroring of signature update servers make date/hour, date/hour/minute answers useless?
 +
 
 +
==Medium-Sized Development Projects==
 +
===Forensic File Viewer ===
 +
* Create a program that visualizes the contents of a file, sort of like hexedit, but with other features:
 +
** Automatically pull out the strings
 +
** Show histogram
 +
** Detect crypto and/or stenography.
 +
* Extend SleuthKit's [[fiwalk]] to report the NTFS alternative data streams.
 +
 
 +
===Data Sniffing===
 +
* Create a method to detect NTFS-compressed cluster blocks on a disk (RAW data stream). A method could be to write a generic signature to detect the beginning of NTFS-compressed file segments on a disk. This method is useful in carving and scanning for textual strings.
 +
 
 +
===SleuthKit Modifications===
 
* Write a FUSE-based mounter for SleuthKit, so that disk images can be forensically mounted using TSK.
 
* Write a FUSE-based mounter for SleuthKit, so that disk images can be forensically mounted using TSK.
* Rewrite '''sorter''' in C++ to make it faster and more flexible.
+
* Modify SleuthKit's API so that the physical location on disk of compressed files can be learned.
  
===Timeline Analysis===
+
===Anti-Frensics Detection===
; Timeline Visualization and Analysis
+
* A pluggable rule-based system that can detect the residual data or other remnants of running a variety of anti-forensics software
: Write a new timeline viewer that supports Logfile fusion (with offsets) and provides the ability to view the logfile in the frequency domain.
+
; Changed Clocks
+
: Detect a system that has had its clock changed.
+
  
==Research Areas==
+
===Carvers===
These are research areas that could easily grow into a PhD thesis.
+
Develop a new carver with a plug-in architecture and support for fragment reassembly carving. Take a look at:
; Stream-based Forensics
+
* [[Carver 2.0 Planning Page]]
: Process the entire disk with one pass to minimize seek time. (You may find it necessary to do a quick metadata scan first.)
+
* ([mailto:rainer.poisel@gmail.com Rainer Poisel']) [https://github.com/rpoisel/mmc Multimedia File Carver], which allows for the reassembly of multimedia fragmented files.
; Stegnography Detection (general purpose)
+
: Detect the use of stegnography by through the analysis of file examplars and specifications.
+
; Sanitization Detection
+
: Detect and diagnose sanitization attempts.
+
; Compressed Data Reconstruction
+
: Reconstruct decompressed data from a GZIP file after the first 1K has been removed.
+
;Evidence Falsification Detection
+
: Automatically detect falsified digital evidence through the use of inconsistency in file system allocations, application data allocation, and log file analysis.
+
; Visualization of data/information in digital forensic context
+
: SWOT of current visualization techniques in forensic tools; improvements; feasibility of 3D representation;
+
  
==Correlation==
+
===Correlation Engine===
 
* Logfile correlation
 
* Logfile correlation
 
* Document identity identification
 
* Document identity identification
 
* Correlation between stored data and intercept data
 
* Correlation between stored data and intercept data
 
* Online Social Network Analysis
 
* Online Social Network Analysis
** Find and download in a forensically secure manner all of the information in a social network (e.g. Facebook, LinkedIn, etc.) associated with a targeted individual.
+
 
** Determine who is searching for a targeted individual. This might be done with a honeypot, or documents with a tracking device in them, or some kind of covert Facebook App.
+
===Data Snarfing/Web Scraping===
 +
* Find and download in a forensically secure manner all of the information in a social network (e.g. Facebook, LinkedIn, etc.) associated with a targeted individual.
 +
* Determine who is searching for a targeted individual. This might be done with a honeypot, or documents with a tracking device in them, or some kind of covert Facebook App.
 
* Automated grouping/annotation of low-level events, e.g. access-time, log-file entry, to higher-level events, e.g. program start, login
 
* Automated grouping/annotation of low-level events, e.g. access-time, log-file entry, to higher-level events, e.g. program start, login
  
==Programming Projects==
+
=== Timeline analysis ===
===File Visualization===
+
* Mapping differences and similarities in multiple versions of a system, e.g. those created by [[Windows Shadow Volumes]] but not limited to
Write a program that visualizes the contents of a file, sort of like hexedit, but with other features:
+
* Write a new timeline viewer that supports Logfile fusion (with offsets) and provides the ability to view the logfile in the frequency domain.
* Automatically pull out the strings
+
* Show histogram
+
* Detect crypto and/or stenography.
+
I would write the program in java with a plug-in architecture.
+
  
===Carving===
+
===Enhancements for Guidance Software's Encase===
* Write [[Carver 2.0 Planning Page | Carver 2.0]]
+
* Develop an EnScript that allows you to script EnCase from Python. (You can do this because EnScripts can run arbitrary DLLs. The EnScript calls the DLL. Each "return" from the DLL is a specific EnCase command to execute. The EnScript then re-enters the DLL.)
* Create a method to detect NTFS-compressed cluster blocks on a disk (RAW data stream). A method could be to write a generic signature to detect the beginning of NTFS-compressed file segments on a disk. This method is useful in carving and scanning for textual strings.
+
  
===fiwalk Enhancements===
+
=== Analysis of packet captures ===
* Rewrite the metadata extraction system.
+
* Identifying various types of DDoS attacks from capture files (pcap): extracting attack statistics, list of attacking bots, determining the type of attack (TCP SYN flood, UDP/ICMP flood, HTTP GET/POST flood, HTTP flood with browser emulation, etc).
* Extend [[fiwalk]] to report the NTFS "inodes."
+
  
==File Format analysis==
+
==Reverse-Engineering Projects==
Analysis of file format for forensic artefacts; could be combined with programming to build code that parses the format.
+
=== Application analysis ===
 
+
* Reverse the on-disk structure of the [[Extensible Storage Engine (ESE) Database File (EDB) format]] to learn:
* Continue work on the [[Extensible Storage Engine (ESE) Database File (EDB) format]] in regard to
+
 
** Fill in the missing information about older ESE databases
 
** Fill in the missing information about older ESE databases
 
** Exchange EDB (MAPI database), STM
 
** Exchange EDB (MAPI database), STM
 
** Active Directory (Active Directory working document available on request)
 
** Active Directory (Active Directory working document available on request)
* Continue work on the [[Notes Storage Facility (NSF)]] (code available on request)
+
* Reverse the on-disk structure of the Lotus [[Notes Storage Facility (NSF)]]
* Microsoft SQL Server databases
+
* Reverse the on-disk structure of Microsoft SQL Server databases
 +
 
 +
=== Volume/File System analysis ===
 +
* Analysis of inter snapshot changes in [[Windows Shadow Volumes]]
 +
* Modify SleuthKit's NTFS implementation to support NTFS encrypted files (EFS)
 +
* Extend SleuthKit's implementation of NTFS to cover Transaction NTFS (TxF) (see [[NTFS]])
 +
* Physical layer access to flash storage (requires reverse-engineering proprietary APIs for flash USB and SSD storage.)
 +
* Add support to SleuthKit for [[Resilient File System (ReFS)|ReFS]].
 +
 
 +
 
 +
 
 +
==Error Rates==
 +
* Develop improved techniques for identifying encrypted data. (It's especially important to distinguish encrypted data from compressed data).
 +
* Quantify the error rate of different forensic tools and processes. Are these rates theoretical or implementation dependent? What is the interaction of the error rates and the [[Daubert]] standard?
 +
 
 +
==Research Areas==
 +
These are research areas that could easily grow into a PhD thesis.
 +
* General-purpose detection of:
 +
** Stegnography
 +
** Sanitization attempts
 +
** Evidence Falsification (perhaps through inconsistency in file system allocations, application data allocation, and log file analysis.
 +
* Visualization of data/information in digital forensic context
 +
* SWOT of current visualization techniques in forensic tools; improvements; feasibility of 3D representation;
 +
 
 +
==See Also==
 +
* [http://itsecurity.uiowa.edu/securityday/documents/guan.pdf Digital Forensics: Research Challenges and Open Problems, Dr. Yong Guan, Iowa State University, Dec. 4, 2007]
 +
* [http://www.forensicfocus.com/project-ideas Forensic Focus: Project Ideas for Digital Forensics Students]
  
 
__NOTOC__
 
__NOTOC__
 +
 +
[[Category:Research]]

Latest revision as of 08:35, 10 September 2013

Interested in doing research in computer forensics? Looking for a master's topic, or just some ideas for a research paper? Here is our list. Please feel free to add your own ideas.

Many of these would make a nice master's project.

Programming/Engineering Projects

Small-Sized Projects

Sleuthkit
  • Rewrite SleuthKit sorter in C++ to make it faster and more flexible.
tcpflow
  • Modify tcpflow's iptree.h implementation so that it only stores discriminating bit prefixes in the tree, similar to D. J. Bernstein's Crit-bit trees.
  • Determine why tcpflow's iptree.h implementation's prune works differently when caching is enabled then when it is disabled

Medium-Sized Non-Programming Projects

Digital Forensics Education

  • Survey existing DFE programs and DF practitioners regarding which tools they use. Report if the tools being taught are the same as the tools that are being used.

Improving quality of forensic examination reports

  • Defense asks you: "When did you update your antivirus program during the forensic examination?", what will you reply: date, or date/hour, or date/hour/minute? How many virus signatures can be added and then excluded as false positives in 24 hours? Does mirroring of signature update servers make date/hour, date/hour/minute answers useless?

Medium-Sized Development Projects

Forensic File Viewer

  • Create a program that visualizes the contents of a file, sort of like hexedit, but with other features:
    • Automatically pull out the strings
    • Show histogram
    • Detect crypto and/or stenography.
  • Extend SleuthKit's fiwalk to report the NTFS alternative data streams.

Data Sniffing

  • Create a method to detect NTFS-compressed cluster blocks on a disk (RAW data stream). A method could be to write a generic signature to detect the beginning of NTFS-compressed file segments on a disk. This method is useful in carving and scanning for textual strings.

SleuthKit Modifications

  • Write a FUSE-based mounter for SleuthKit, so that disk images can be forensically mounted using TSK.
  • Modify SleuthKit's API so that the physical location on disk of compressed files can be learned.

Anti-Frensics Detection

  • A pluggable rule-based system that can detect the residual data or other remnants of running a variety of anti-forensics software

Carvers

Develop a new carver with a plug-in architecture and support for fragment reassembly carving. Take a look at:

Correlation Engine

  • Logfile correlation
  • Document identity identification
  • Correlation between stored data and intercept data
  • Online Social Network Analysis

Data Snarfing/Web Scraping

  • Find and download in a forensically secure manner all of the information in a social network (e.g. Facebook, LinkedIn, etc.) associated with a targeted individual.
  • Determine who is searching for a targeted individual. This might be done with a honeypot, or documents with a tracking device in them, or some kind of covert Facebook App.
  • Automated grouping/annotation of low-level events, e.g. access-time, log-file entry, to higher-level events, e.g. program start, login

Timeline analysis

  • Mapping differences and similarities in multiple versions of a system, e.g. those created by Windows Shadow Volumes but not limited to
  • Write a new timeline viewer that supports Logfile fusion (with offsets) and provides the ability to view the logfile in the frequency domain.

Enhancements for Guidance Software's Encase

  • Develop an EnScript that allows you to script EnCase from Python. (You can do this because EnScripts can run arbitrary DLLs. The EnScript calls the DLL. Each "return" from the DLL is a specific EnCase command to execute. The EnScript then re-enters the DLL.)

Analysis of packet captures

  • Identifying various types of DDoS attacks from capture files (pcap): extracting attack statistics, list of attacking bots, determining the type of attack (TCP SYN flood, UDP/ICMP flood, HTTP GET/POST flood, HTTP flood with browser emulation, etc).

Reverse-Engineering Projects

Application analysis

Volume/File System analysis

  • Analysis of inter snapshot changes in Windows Shadow Volumes
  • Modify SleuthKit's NTFS implementation to support NTFS encrypted files (EFS)
  • Extend SleuthKit's implementation of NTFS to cover Transaction NTFS (TxF) (see NTFS)
  • Physical layer access to flash storage (requires reverse-engineering proprietary APIs for flash USB and SSD storage.)
  • Add support to SleuthKit for ReFS.


Error Rates

  • Develop improved techniques for identifying encrypted data. (It's especially important to distinguish encrypted data from compressed data).
  • Quantify the error rate of different forensic tools and processes. Are these rates theoretical or implementation dependent? What is the interaction of the error rates and the Daubert standard?

Research Areas

These are research areas that could easily grow into a PhD thesis.

  • General-purpose detection of:
    • Stegnography
    • Sanitization attempts
    • Evidence Falsification (perhaps through inconsistency in file system allocations, application data allocation, and log file analysis.
  • Visualization of data/information in digital forensic context
  • SWOT of current visualization techniques in forensic tools; improvements; feasibility of 3D representation;

See Also