Difference between pages "TDMA" and "Forensic corpora"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
(How it works)
 
(Added and formatted pyflag info)
 
Line 1: Line 1:
{{Wikify}}
+
This page describes large-scale corpora of forensically interesting information that are available for those involved in forensic research.
  
'''TDMA - Time Division Multiple Access'''
+
= Disk Images =
 +
 +
;''The Harvard/MIT Drive Image Corpus.'' Between 1998 and 2006, [[Simson Garfinkel|Garfinkel]] acquired 1250+ hard drives on the secondary market. These hard drive images have proven invaluable in performing a range of studies such as the  developing of new forensic techniques and the sanitization practices of computer users.
  
 +
* Garfinkel, S. and Shelat, A., [http://www.simson.net/clips/academic/2003.IEEE.DiskDriveForensics.pdf "Remembrance of Data Passed: A Study of Disk Sanitization Practices,"] IEEE Security and Privacy, January/February 2003.
  
== Why use TDMA?==
+
;''The Honeynet Project Forensic Challenge.'' In 2001 the Honeynet project distributed a set of disk images and asked participants to conduct a forensic analysis of a compromised computer. Entries were judged and posted for all to see. The drive and writeups are still available online.
TDMA (Time Division Multiple Access), is used in the largest available networks in the world. It is a digital communication method allowing many users to access a single communication channel.  TDMA is aimed at dealing with multiple access to the same communication medium.  Each individual user is given a unique time slot within the defined communication channel.  This methodology increases the efficiency of transmission by allowing multiple users simultaneous access to a time slot.  A significant benefit is TDMA can be easily adapted to transmission of data as well as voice communication. And the reason for choosing TDMA for all these standards is that it enables some vital features for system operation in an advanced cellular or PCS environment.
+
  
TDMA offers the ability to carry date rates of 64 kbps to 120 MBPS, which enables options of communication such as fax, voiceband data, sms, as well as bandwidth intensive apps.  TDMA allows the mobile device to have extended battery life, since the cellular device is only transmitting a portion of the time during conversations.  In addition, TDMA is the most cost effective technology for upgrading an analog system to digital (http://www.iec.org/online/tutorials/tdma/topic04.html).
+
* [http://www.honeynet.org/challenge/index.html The Honeynet Project's Forensic Challenge], January 2001.
  
==How it works==
+
;''The [http://www.cfreds.nist.gov/ Computer Forensic Reference Data Sets]'' project from [[National Institute of Standards and Technology|NIST]] hosts a few sample cases that may be useful for examiners to practice with.
  
It’s necessary for TDMA to rely upon that fact that the audio signal has been digitized.  These signals are divided into a number of milliseconds and distributed into time slots.  TDMA is also the access technique used in the European digital cellular standard, GSM, and the Japanese digital standard, personal digital cellular (PDC).  A single channel can carry all four conversations if each conversation is divided into relatively short fragments, is assigned a time slot, and is transmitted in synchronized timed.  For instance, if there are four people: Jan, Tom, Bill, and Bob making calls, each would be assigned a time slot on a single channel.  However, if Bob was no longer using his phone, his time slot would still be used (http://www.iec.org/online/tutorials/tdma/topic04.html).
+
* http://www.cfreds.nist.gov/Hacking_Case.html
  
==Pros & Cons==
+
;''The PyFlag standard test image set''
 +
* http://pyflag.sourceforge.net/Documentation/tutorials/howtos/test_image.html
  
TDMA can be wasteful of bandwidth because time slots are allocated to specific conversations whether or not anyone was speaking at the given moment.  There is an enhanced version however, EDTMA, which attempts to correct this problem.  Unlike TDMA which waits to determine whether a subscriber is transmitting, ETDMA assigns subscribers using a dynamic method.  The data is sent through pauses which normal speech contains.  If the subscriber has something they would like to transmit, it is placed as one bit in the buffer queue.  The system then scans the buffer and notices the user has something to transmit, allocating the bandwidth accordingly.  However, if there is nothing to transmit, it goes to the next subscriber.  This well proven multiple access technique can be 10 times more efficient as analog transmission of TDMA.
+
= Network Packets and Traces =
  
==Cell Phone Providers==
+
== DARPA ID Eval ==
  
There are several cell phone companies competing to sell their phones and advertise their network coverage area. They will primarily be competing within two categories:  TDMA and [[CDMA]].  The pros and cons of TDMA have been mentioned, and the companies that offer TDMA are the following:  AT&T, Cingular, Nextel, T-Mobile.  The companies that support [[CDMA]] are:  ALLTEL, Amp'd Mobile, Cricket Wireless, ESPN, Quest, Sprint, Verizon, Virgin Mobile. As we can see, more companies are supporting [[CDMA]]. The question that arises is, what makes the two so different? 
+
''The DARPA Intrusion Detection Evaluation.'' In 1998, 1999 and 2000 the Information Systems Technology Group at MIT Lincoln Laboratory created a test network complete with simulated servers, clients, clerical workers, programmers, and system managers. Baseline traffic was collected. The systems on the network were then “attacked” by simulated hackers. Some of the attacks were well-known at the time, while others were developed for the purpose of the evaluation.
  
 +
* [http://www.ll.mit.edu/IST/ideval/data/1998/1998_data_index.html 1998 DARPA Intrusion Detection Evaluation]
 +
* [http://www.ll.mit.edu/IST/ideval/data/1999/1999_data_index.html 1999 DARPA Intrusion Detection Evaluation]
 +
* [http://www.ll.mit.edu/IST/ideval/data/2000/2000_data_index.html 2000 DARPA Intrusion Detection Scenario Specific]
  
==TDMA Vs. [[CDMA]]==
+
== WIDE==
 +
''The [http://www.wide.ad.jp/project/wg/mawi.html MAWI Working Group] of the [http://www.wide.ad.jp/ WIDE Project]'' maintains a [http://tracer.csl.sony.co.jp/mawi/ Traffic Archive]. In it you will find:
 +
* daily trace of a trans-Pacific T1 line;
 +
* daily trace at an IPv6 line connected to 6Bone;
 +
* daily trace at another trans-Pacific line (100Mbps link) in operation since 2006/07/01.
  
TDMA is better for international plans and debately has better battery life.  [[CDMA]] claims it has better battery life and coverage, however, Cingular, in the US, is the largest TDMA carrier.  All of this started however ever since [[CDMA]] was introduced in 1989, and the wireless world has been in debate over merits of TDMA and [[CDMA]].  Those who are for [[CDMA]] have claimed that its technology has bandwidth efficiency of up to 13 times that of TDMA and between 20 to 40 times that of analog transmission.    
+
Traffic traces are made by tcpdump, and then, IP addresses in the traces are scrambled by a modified version of [[tcpdpriv]].
  
Those who favor TDMA point out that there has been no successful major trial of [[CDMA]] technology that supports the capacities it claims.  Not to mention, theoretical improvements in bandwidth efficiency claimed [[CDMA]] is now being approached by enhancements to TDMA technology.  TDMA's evolution allows capacity increases of 20 to 40 fold over analog in the near future.  [[CDMA]] is a very expensive technology that needs $300,000 per base station, compared to $80,000 for TDMA. Lastly, TDMA is the proven leader as the most economical digital migration path for existing AMPS networks.  No one has the final word in this debate, however, it is evident that TDMA will remain the dominant technology in the wireless market.
+
==Wireshark==
 +
The open source Wireshark project (formerly known as Ethereal) has a website with many network packet captures:
 +
* http://wiki.wireshark.org/SampleCaptures
  
== External Links ==
+
==NFS Packets==
 +
The Storage Networking Industry Association has a set of network file system traces that can be downloaded from:
 +
* http://iotta.snia.org/traces
 +
* http://tesla.hpl.hp.com/public_software/
  
http://en.wikipedia.org/wiki/Time_division_multiple_access <br>
+
=Text Files=
http://www.iec.org/online/tutorials/tdma/topic04.html <br>
+
==Email messages==
http://www.cellphoneinfo.com/index.html
+
 
 +
''The Enron Corpus'' of email messages that were seized by the Federal Energy Regulatory Commission during its investigation of Enron.
 +
 
 +
* http://www.cs.cmu.edu/~enron
 +
* http://www.enronemail.com/
 +
 
 +
==Log files==
 +
[http://crawdad.cs.dartmouth.edu/index.php CRAWDAD] is a community archive for wireless data.
 +
 
 +
[http://www.caida.org/data/ CAIDA] collects a wide variety of data.
 +
 
 +
[http://www.dshield.org/howto.html DShield] asks users to submit firewall logs.
 +
 
 +
==Text for Text Retrieval==
 +
The [http://trec.nist.gov Text REtrieval Conference (TREC)] has made available a series of [http://trec.nist.gov/data.html text collections].
 +
 
 +
==American National Corpus==
 +
The [http://www.americannationalcorpus.org/ American National Corpus (ANC) project] is creating a massive collection of American english from 1990 onward. The goal is to create a corpus of at least 100 million words that is comparable to the British National Corpus.
 +
 
 +
==British National Corpus==
 +
The [http://www.natcorp.ox.ac.uk/ British National Corpus (100)] is a 100 million word collection of written and spoken english from a variety of sources.
 +
 
 +
=Voice=
 +
==CALLFRIEND==
 +
CALLFRIEND is a database of recorded English conversations. A total of 60 recorded conversations are available from the University of Pennsylvania at a cost of $600.
 +
 
 +
==TalkBank==
 +
TalkBank in an online database of spoken language. The project was originally funded between 1999 and 2004 by two National Science Foundation grants; ongoing support is provided by two NSF grants and one NIH grant.
 +
 
 +
==Augmented Multi-Party Interaction Corpus==
 +
The [http://corpus.amiproject.org/ AMI Meeting Corpus] has 100 hours of meeting recordings.
 +
 
 +
==Other Corpora==
 +
The [http://corpus.canterbury.ac.nz/ Canterbury Corpus] is a set of files used for testing lossless compression algorithms. The corpus consists of 11 natural files, 4 artificial files, 3 large files, and a file with the first million digits of pi.  You can also find a copyof the Calgaruy Corpus at the website, which was the defacto standard for testing lossless compression algorithms in the 1990s.
 +
 
 +
The [http://traces.cs.umass.edu/index.php/Main/HomePage UMass Trace Repository] provides network, storage, and other traces to the research community for analysis. The UMass Trace Repository is supported by grant #CNS-323597 from the National Science Foundation.

Revision as of 17:46, 12 July 2008

This page describes large-scale corpora of forensically interesting information that are available for those involved in forensic research.

Disk Images

The Harvard/MIT Drive Image Corpus. Between 1998 and 2006, Garfinkel acquired 1250+ hard drives on the secondary market. These hard drive images have proven invaluable in performing a range of studies such as the developing of new forensic techniques and the sanitization practices of computer users.
The Honeynet Project Forensic Challenge. In 2001 the Honeynet project distributed a set of disk images and asked participants to conduct a forensic analysis of a compromised computer. Entries were judged and posted for all to see. The drive and writeups are still available online.
The Computer Forensic Reference Data Sets project from NIST hosts a few sample cases that may be useful for examiners to practice with.
The PyFlag standard test image set

Network Packets and Traces

DARPA ID Eval

The DARPA Intrusion Detection Evaluation. In 1998, 1999 and 2000 the Information Systems Technology Group at MIT Lincoln Laboratory created a test network complete with simulated servers, clients, clerical workers, programmers, and system managers. Baseline traffic was collected. The systems on the network were then “attacked” by simulated hackers. Some of the attacks were well-known at the time, while others were developed for the purpose of the evaluation.

WIDE

The MAWI Working Group of the WIDE Project maintains a Traffic Archive. In it you will find:

  • daily trace of a trans-Pacific T1 line;
  • daily trace at an IPv6 line connected to 6Bone;
  • daily trace at another trans-Pacific line (100Mbps link) in operation since 2006/07/01.

Traffic traces are made by tcpdump, and then, IP addresses in the traces are scrambled by a modified version of tcpdpriv.

Wireshark

The open source Wireshark project (formerly known as Ethereal) has a website with many network packet captures:

NFS Packets

The Storage Networking Industry Association has a set of network file system traces that can be downloaded from:

Text Files

Email messages

The Enron Corpus of email messages that were seized by the Federal Energy Regulatory Commission during its investigation of Enron.

Log files

CRAWDAD is a community archive for wireless data.

CAIDA collects a wide variety of data.

DShield asks users to submit firewall logs.

Text for Text Retrieval

The Text REtrieval Conference (TREC) has made available a series of text collections.

American National Corpus

The American National Corpus (ANC) project is creating a massive collection of American english from 1990 onward. The goal is to create a corpus of at least 100 million words that is comparable to the British National Corpus.

British National Corpus

The British National Corpus (100) is a 100 million word collection of written and spoken english from a variety of sources.

Voice

CALLFRIEND

CALLFRIEND is a database of recorded English conversations. A total of 60 recorded conversations are available from the University of Pennsylvania at a cost of $600.

TalkBank

TalkBank in an online database of spoken language. The project was originally funded between 1999 and 2004 by two National Science Foundation grants; ongoing support is provided by two NSF grants and one NIH grant.

Augmented Multi-Party Interaction Corpus

The AMI Meeting Corpus has 100 hours of meeting recordings.

Other Corpora

The Canterbury Corpus is a set of files used for testing lossless compression algorithms. The corpus consists of 11 natural files, 4 artificial files, 3 large files, and a file with the first million digits of pi. You can also find a copyof the Calgaruy Corpus at the website, which was the defacto standard for testing lossless compression algorithms in the 1990s.

The UMass Trace Repository provides network, storage, and other traces to the research community for analysis. The UMass Trace Repository is supported by grant #CNS-323597 from the National Science Foundation.