Difference between pages "Ddrescue" and "Research Topics"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(Other Resources)
 
(Big Programming Projects)
 
Line 1: Line 1:
{{Infobox_Software |
+
Interested in doing research in computer forensics? Looking for a master's topic, or just some ideas for a research paper? Here is our list. Please feel free to add your own ideas.
  name = ddrescure |
+
  maintainer = [[Antonio Diaz Diaz]]|
+
  os = {{Linux}}|
+
  genre = {{Disk imaging}} |
+
  license = {{GPL}} |
+
  website = [http://www.gnu.org/software/ddrescue/ddrescue.html http://www.gnu.org/software/ddrescue/ddrescue.html] |
+
}}
+
  
'''ddrescue''' is a raw disk imaging tool that "copies data from one file or block device to another, trying hard to rescue data in case of read errors."  The application is developed as part of the GNU project and has written with UNIX/Linux in mind.
+
Many of these would make a nice master's project.
  
'''ddrescue''' and '''[[dd_rescue]]''' are completely different programs which share no development between them.  The two projects are not related in any way except that they both attempt to enhance the standard [[dd]] tool and coincidentally chose similar names for their new programs.
+
=Programming Projects=
  
From the [[ddrescue]] info pages:
+
==Small-Sized Programming Projects==
<blockquote>
+
* Modify [[bulk_extractor]] so that it can directly acquire a raw device under Windows. This requires replacing the current ''open'' function call with a ''CreateFile'' function call and using windows file handles.
GNU ddrescue is a data recovery tool. It copies data from one file or block device (hard disc, cdrom, etc) to another, trying hard to rescue data in case of read errors.<br><br>
+
* Rewrite SleuthKit '''sorter''' in C++ to make it faster and more flexible.
  
Ddrescue does not truncate the output file if not asked to. So, every time you run it on the same output file, it tries to fill in the gaps.<br><br>
+
==Medium-Sized Programming Projects==
 +
* Create a program that visualizes the contents of a file, sort of like hexedit, but with other features:
 +
** Automatically pull out the strings
 +
** Show histogram
 +
** Detect crypto and/or stenography.
 +
* Extend [[fiwalk]] to report the NTFS alternative data streams.
 +
* Create a method to detect NTFS-compressed cluster blocks on a disk (RAW data stream). A method could be to write a generic signature to detect the beginning of NTFS-compressed file segments on a disk. This method is useful in carving and scanning for textual strings.
 +
* Write a FUSE-based mounter for SleuthKit, so that disk images can be forensically mounted using TSK.
 +
* Modify SleuthKit's API so that the physical location on disk of compressed files can be learned.
  
The basic operation of ddrescue is fully automatic. That is, you don't have to wait for an error, stop the program, read the log, run it in reverse mode, etc.<br><br>
 
  
If you use the logfile feature of ddrescue, the data is rescued very efficiently (only the needed blocks are read). Also you can interrupt the rescue at any time and resume it later at the same point.<br><br>
+
==Big Programming Projects==
 +
* Develop a new carver with a plug-in architecture and support for fragment reassembly carving (see [[Carver 2.0 Planning Page]]). Also, have a look at my ([mailto:rainer.poisel@gmail.com Rainer Poisel]) file carver: [https://github.com/rpoisel/mmc Multimedia File Carver]. It allows for the reassembly of fragmented files (multimedia files and digital images at the moment). Please feel free to contact me in case you want to contribute to this emerging project!
 +
* Write a new timeline viewer that supports Logfile fusion (with offsets) and provides the ability to view the logfile in the frequency domain.
  
Automatic merging of backups: If you have two or more damaged copies of a file, cdrom, etc, and run ddrescue on all of them, one at a time, with the same output file, you will probably obtain a complete and error-free file. This is so because the probability of having damaged areas at the same places on different input files is very low. Using
+
* Correlation Engine:
the logfile, only the needed blocks are read from the second and successive copies.
+
** Logfile correlation
</blockquote>
+
** Document identity identification
 +
** Correlation between stored data and intercept data
 +
** Online Social Network Analysis
  
== Installation ==
+
* Find and download in a forensically secure manner all of the information in a social network (e.g. Facebook, LinkedIn, etc.) associated with a targeted individual.
 +
** Determine who is searching for a targeted individual. This might be done with a honeypot, or documents with a tracking device in them, or some kind of covert Facebook App.
 +
** Automated grouping/annotation of low-level events, e.g. access-time, log-file entry, to higher-level events, e.g. program start, login
  
=== Bootable CD ===
+
=Reverse-Engineering Projects=
ddrescue is available on bootable rescue cds such as SystemRescueCd http://www.sysresccd.org/Main_Page.
+
==Reverse-Engineering Projects==
=== Debian and Ubuntu ===
+
=== Application analysis ===
The package 'ddrescue' in Debian and Ubuntu is actually [[dd_rescue]], another dd-like program which does not maintain a recovery log.  The correct package is gddrescue.
+
* Reverse the on-disk structure of the [[Extensible Storage Engine (ESE) Database File (EDB) format]] to learn:
 +
** Fill in the missing information about older ESE databases
 +
** Exchange EDB (MAPI database), STM
 +
** Active Directory (Active Directory working document available on request)
 +
* Reverse the on-disk structure of the Lotus [[Notes Storage Facility (NSF)]]
 +
* Reverse the on-disk structure of Microsoft SQL Server databases
  
Debian
+
=== Volume/File System analysis ===
<blockquote>
+
* Analysis of inter snapshot changes in [[Windows Shadow Volumes]]
aptitude install gddrescue
+
* Add support to SleuthKit for [[FAT|eXFAT]], Microsoft's new FAT file system.
</blockquote>
+
* Add support to SleuthKit for [[Resilient File System (ReFS)|ReFS]].
Ubuntu
+
* Modify SleuthKit's NTFS implementation to support NTFS encrypted files (EFS)
<blockquote>
+
* Extend SleuthKit's implementation of NTFS to cover Transaction NTFS (TxF) (see [[NTFS]])
sudo apt-get install gddrescue
+
* Physical layer access to flash storage (requires reverse-engineering proprietary APIs for flash USB and SSD storage.)
</blockquote>
+
=== Gentoo ===
+
<blockquote>
+
emerge ddrescue
+
</blockquote>
+
== Partition recovery ==
+
  
=== Kernel 2.6.3+ & ddrescue 1.4+ ===
+
==EnCase Enhancement==
'ddrescue --direct' will open the input with the O_DIRECT option for uncached reads. 'raw devices' are not needed on newer kernels. For older kernels see below.
+
* Develop an EnScript that allows you to script EnCase from Python. (You can do this because EnScripts can run arbitrary DLLs. The EnScript calls the DLL. Each "return" from the DLL is a specific EnCase command to execute. The EnScript then re-enters the DLL.)
  
First you copy as much data as possible, without retrying or splitting sectors:
+
= Timeline analysis =
<blockquote>
+
* Mapping differences and similarities in multiple versions of a system, e.g. those created by [[Windows Shadow Volumes]] but not limited to
ddrescue --no-split /dev/hda1 imagefile logfile
+
</blockquote>
+
  
Now let it retry previous errors 3 times, using uncached reads:
+
=Research Areas=
<blockquote>
+
These are research areas that could easily grow into a PhD thesis.
ddrescue --direct --max-retries=3 /dev/hda1 imagefile logfile
+
* General-purpose detection of:
</blockquote>
+
** Stegnography
 +
** Sanitization attempts
 +
** Evidence Falsification (perhaps through inconsistency in file system allocations, application data allocation, and log file analysis.
 +
* Visualization of data/information in digital forensic context
 +
* SWOT of current visualization techniques in forensic tools; improvements; feasibility of 3D representation;
  
If that fails you can try again but retrimmed, so it tries to reread full sectors:
+
__NOTOC__
<blockquote>
+
ddrescue --direct --retrim  --max-retries=3 /dev/hda1 imagefile logfile
+
</blockquote>
+
  
You can now use ddrescue (or normal dd) to copy the imagefile to a new partition on a new disk. Use the appropriate filesystem checkers (fsck, CHKDSK) to try to fix errors caused by the bad blocks. Be sure to keep the imagefile around. Just in case the filesystem is severely broken, and datacarving tools like testdisk need to to be used on the original image.
+
[[Category:Research]]
 
+
=== Before linux kernel 2.6.3 / 2.4.x ===
+
In 2.6.3 the 'raw device' has been marked obsolete. On later kernels ddrescue will use O_DIRECT on the input to do uncached reads.
+
 
+
First you copy as much data as possible, without retrying or splitting sectors:
+
<blockquote>
+
ddrescue --no-split /dev/hda1 imagefile logfile
+
</blockquote>
+
 
+
Now change over to raw device access. Let it retry previous errors 3 times, don't read past last block in logfile:
+
<blockquote>
+
modprobe raw<br>
+
raw /dev/raw/raw1 /dev/hda1<br>
+
ddrescue --max-retries=3 --complete-only /dev/raw/raw1 imagefile logfile
+
</blockquote>
+
 
+
If that fails you can try again (still using raw) but retrimmed, so it tries to reread full sectors:
+
<blockquote>
+
ddrescue --retrim --max-retries=3 --complete-only /dev/raw/raw1 imagefile logfile
+
</blockquote>
+
 
+
You can now use ddrescue (or normal dd) to copy the imagefile to a new partition on a new disk. Use the appropriate filesystem checkers (fsck, CHKDSK) to try to fix errors caused by the bad blocks. Be sure to keep the imagefile around. Just in case the filesystem is severely broken, and datacarving tools like testdisk need to to be used on the original image.
+
 
+
At the end you may want to unbind the raw device:
+
<blockquote>
+
raw /dev/raw/raw1 0 0
+
</blockquote>
+
 
+
== Examples ==
+
 
+
These two examples are taken directly from the [[ddrescue]] info pages.
+
 
+
Example 1: Rescue an ext2 partition in /dev/hda2 to /dev/hdb2
+
 
+
'''Please Note:''' This will overwrite ALL data on the partition you are copying to. If you do not want to do that, rather create an image of the partition to be rescued.
+
<blockquote>
+
ddrescue -r3 /dev/hda2 /dev/hdb2 logfile<br>
+
e2fsck -v -f /dev/hdb2<br>
+
mount -t ext2 -o ro /dev/hdb2 /mnt<br>
+
</blockquote>
+
 
+
Example 2: Rescue a CD-ROM in /dev/cdrom
+
<blockquote>
+
ddrescue -b 2048 /dev/cdrom cdimage logfile
+
</blockquote>
+
write cdimage to a blank CD-ROM
+
 
+
 
+
This example is derived from the ddrescue manual.
+
 
+
Example 3: Rescue an entire hard disk /dev/sda to another disk /dev/sdb
+
 
+
copy the error free areas first
+
ddrescue -n /dev/sda /dev/sdb rescue.log
+
attempt to recover any bad sectors
+
ddrescue -r 1 /dev/sda /dev/sdb rescue.log
+
 
+
== Options ==
+
 
+
-h, --help
+
    display this help and exit
+
-V, --version
+
    output version information and exit
+
-b, --block-size=<bytes>
+
    hardware block size of input device [512]  
+
-B, --binary-prefixes
+
    show binary multipliers in numbers [default SI]
+
-c, --cluster-size=<blocks>
+
    hardware blocks to copy at a time [128]
+
-C, --complete-only
+
    do not read new data beyond logfile limits
+
-d, --direct
+
    use direct disc access for input file
+
-D, --synchronous
+
    use synchronous writes for output file
+
-e, --max-errors=<n>
+
    maximum number of error areas allowed
+
-F, --fill=<types>
+
    fill given type areas with infile data (?*/-+)
+
-g, --generate-logfile
+
    generate approximate logfile from partial copy
+
-i, --input-position=<pos>
+
    starting position in input file [0]
+
-n, --no-split
+
    do not try to split or retry error areas
+
-o, --output-position=<pos>
+
    starting position in output file [ipos]
+
-q, --quiet
+
    quiet operation
+
-r, --max-retries=<n>
+
    exit after given retries (-1=infinity) [0]
+
-R, --retrim
+
    mark all error areas as non-trimmed
+
-s, --max-size=<bytes>
+
    maximum size of data to be copied
+
-S, --sparse
+
    use sparse writes for output file
+
-t, --truncate
+
    truncate output file
+
-v, --verbose
+
    verbose operation
+
 
+
Numbers may be followed by a multiplier: b = blocks, k = kB = 10^3 = 1000, Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...
+
 
+
 
+
== Cygwin ==
+
 
+
As of release 1.4-rc1, it can be compiled directly in [[Cygwin]] [http://en.wikipedia.org/wiki/Out_of_the_box Out of the Box]. Precompiled packages are available in the [http://cygwin.com/packages/ Cygwin distribution]. This makes it usable natively on [[Windows]] systems.
+
 
+
== See also ==
+
 
+
* [[aimage]]
+
* [[Blackbag]]
+
* [[dcfldd]]
+
* [[dd]]
+
* [[dd_rescue]]
+
* [[sdd]]
+
 
+
== Other Resources ==
+
[http://pfuender.net/?p=80| Useful code-snippets for DDrescue]
+
 
+
[http://www.myfixlog.com/fix.php?fid=21| Tutorial for beginners that guides through the process of imaging a hard drive with DDrescue and mounting it in Windows, using free software.]
+

Revision as of 04:13, 25 September 2012

Interested in doing research in computer forensics? Looking for a master's topic, or just some ideas for a research paper? Here is our list. Please feel free to add your own ideas.

Many of these would make a nice master's project.

Programming Projects

Small-Sized Programming Projects

  • Modify bulk_extractor so that it can directly acquire a raw device under Windows. This requires replacing the current open function call with a CreateFile function call and using windows file handles.
  • Rewrite SleuthKit sorter in C++ to make it faster and more flexible.

Medium-Sized Programming Projects

  • Create a program that visualizes the contents of a file, sort of like hexedit, but with other features:
    • Automatically pull out the strings
    • Show histogram
    • Detect crypto and/or stenography.
  • Extend fiwalk to report the NTFS alternative data streams.
  • Create a method to detect NTFS-compressed cluster blocks on a disk (RAW data stream). A method could be to write a generic signature to detect the beginning of NTFS-compressed file segments on a disk. This method is useful in carving and scanning for textual strings.
  • Write a FUSE-based mounter for SleuthKit, so that disk images can be forensically mounted using TSK.
  • Modify SleuthKit's API so that the physical location on disk of compressed files can be learned.


Big Programming Projects

  • Develop a new carver with a plug-in architecture and support for fragment reassembly carving (see Carver 2.0 Planning Page). Also, have a look at my (Rainer Poisel) file carver: Multimedia File Carver. It allows for the reassembly of fragmented files (multimedia files and digital images at the moment). Please feel free to contact me in case you want to contribute to this emerging project!
  • Write a new timeline viewer that supports Logfile fusion (with offsets) and provides the ability to view the logfile in the frequency domain.
  • Correlation Engine:
    • Logfile correlation
    • Document identity identification
    • Correlation between stored data and intercept data
    • Online Social Network Analysis
  • Find and download in a forensically secure manner all of the information in a social network (e.g. Facebook, LinkedIn, etc.) associated with a targeted individual.
    • Determine who is searching for a targeted individual. This might be done with a honeypot, or documents with a tracking device in them, or some kind of covert Facebook App.
    • Automated grouping/annotation of low-level events, e.g. access-time, log-file entry, to higher-level events, e.g. program start, login

Reverse-Engineering Projects

Reverse-Engineering Projects

Application analysis

Volume/File System analysis

  • Analysis of inter snapshot changes in Windows Shadow Volumes
  • Add support to SleuthKit for eXFAT, Microsoft's new FAT file system.
  • Add support to SleuthKit for ReFS.
  • Modify SleuthKit's NTFS implementation to support NTFS encrypted files (EFS)
  • Extend SleuthKit's implementation of NTFS to cover Transaction NTFS (TxF) (see NTFS)
  • Physical layer access to flash storage (requires reverse-engineering proprietary APIs for flash USB and SSD storage.)

EnCase Enhancement

  • Develop an EnScript that allows you to script EnCase from Python. (You can do this because EnScripts can run arbitrary DLLs. The EnScript calls the DLL. Each "return" from the DLL is a specific EnCase command to execute. The EnScript then re-enters the DLL.)

Timeline analysis

  • Mapping differences and similarities in multiple versions of a system, e.g. those created by Windows Shadow Volumes but not limited to

Research Areas

These are research areas that could easily grow into a PhD thesis.

  • General-purpose detection of:
    • Stegnography
    • Sanitization attempts
    • Evidence Falsification (perhaps through inconsistency in file system allocations, application data allocation, and log file analysis.
  • Visualization of data/information in digital forensic context
  • SWOT of current visualization techniques in forensic tools; improvements; feasibility of 3D representation;