Difference between revisions of "Text File (TXT)"

From ForensicsWiki
Jump to: navigation, search
m
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Text file formats''' usually have the '''.txt''' extension.
+
The '''Text file (TXT)''' format consist of 8-, 16- or 32-bit characters that use printable characters along with some control data such as tabs and line feeds. [http://en.wikipedia.org/wiki/Text_file] Text files are split into several major types:  
 
+
* DOS/Windows format ends each line using Carriage Return (CR) or char(13) and a Line Feed (LF) or char(10) byte sequence,   
These files contain 8- or 16-bit characters that use printable characters along with some control data such as tabs and line feeds. [http://en.wikipedia.org/wiki/Text_file] Text files are split into several major types:  
+
* DOS/Windows format ends each line using Carriage Return (CR) or char(13) and a Line Feed (LF) char(10) byte sequence,   
+
 
* Unix format includes only the Carriage Return (CR) or char (13) at the end of the line.  
 
* Unix format includes only the Carriage Return (CR) or char (13) at the end of the line.  
* Unicode includes an optional encoding in the first two bytes Byte Order Mark (BOM) that identifies the unicode encoding. This is mainly used to identify little endian or big endian byte order.
+
* Macintosh format includes only the Line Feed (LF) or char(10) at the end of the line.
 +
* Unicode includes an optional encoding in the first two bytes Byte Order Mark (BOM) that identifies the Unicode encoding. This is used to identify little endian or big endian byte order. Unicode defines an 8-bit encoding UTF-8, a 16-bit encoding UTF-16 and a 32-bit encoding UTF-32. Earlier equivalent encodings are respectively UCS-1, UCS-2 and USC-4
 
* EBCIDIC used char(15) for a new line. [http://en.wikipedia.org/wiki/EBCDIC]
 
* EBCIDIC used char(15) for a new line. [http://en.wikipedia.org/wiki/EBCDIC]
  
They are usually [[ASCII]] encoded, although other encodings are possible to allow various language scripts to be used. Other encodings include EBCIDIC from the old IBM mainframe. Text files can have the [[MIME type]] "text/plain", often with suffixes indicating an encoding (e.g. "text/plain;charset=UTF-8".)  Any basic text reader can be used to view the contents of a simple text file, however some (notably Notepad) have issues with certain less popular encodings. Wordpad is included with windows and may display the files properly.  
+
They are usually [[ASCII]] encoded, although other encodings are possible to allow various language scripts to be used. Other encodings include EBCIDIC from the old IBM mainframe. Text files can have the [[MIME type]] "text/plain", often with suffixes indicating an encoding (e.g. "text/plain;charset=UTF-8".)  Any basic text reader can be used to view the contents of a simple text file, however some (notably Notepad) have issues with certain less popular encodings. Wordpad is included with [[Windows]] and may display the files properly.  
  
 
Translation of a DOS/Windows text file to Unix is performed by removing the Carriage Return from the end of the line.
 
Translation of a DOS/Windows text file to Unix is performed by removing the Carriage Return from the end of the line.
 
The reverse is simply the addition of the Carriage Return to the Line Feed. Files that have double spaces between the lines may have been improperly translated from one system to another.
 
The reverse is simply the addition of the Carriage Return to the Line Feed. Files that have double spaces between the lines may have been improperly translated from one system to another.
  
A number of file formats is actually "text files", but bears diffrent extensions. For example is web documents ([[HTML]]-files) text files but is written with a speciffic syntax so the applications the files are designed to work with can read i correctly. Other kinds of files that can be seen as text files are source code files, xml, etc.
+
Text files usually have the '''.txt''' extension. A number of file formats is actually "text files", but bear different extensions. For example is web documents ([[HTML]]-files) text files but is written with a specific syntax so the applications the files are designed to work with can read i correctly. Other kinds of files that can be seen as text files are source code files, xml, etc.
  
 
[[Category:File Formats]]
 
[[Category:File Formats]]

Latest revision as of 09:59, 31 January 2009

The Text file (TXT) format consist of 8-, 16- or 32-bit characters that use printable characters along with some control data such as tabs and line feeds. [1] Text files are split into several major types:

  • DOS/Windows format ends each line using Carriage Return (CR) or char(13) and a Line Feed (LF) or char(10) byte sequence,
  • Unix format includes only the Carriage Return (CR) or char (13) at the end of the line.
  • Macintosh format includes only the Line Feed (LF) or char(10) at the end of the line.
  • Unicode includes an optional encoding in the first two bytes Byte Order Mark (BOM) that identifies the Unicode encoding. This is used to identify little endian or big endian byte order. Unicode defines an 8-bit encoding UTF-8, a 16-bit encoding UTF-16 and a 32-bit encoding UTF-32. Earlier equivalent encodings are respectively UCS-1, UCS-2 and USC-4
  • EBCIDIC used char(15) for a new line. [2]

They are usually ASCII encoded, although other encodings are possible to allow various language scripts to be used. Other encodings include EBCIDIC from the old IBM mainframe. Text files can have the MIME type "text/plain", often with suffixes indicating an encoding (e.g. "text/plain;charset=UTF-8".) Any basic text reader can be used to view the contents of a simple text file, however some (notably Notepad) have issues with certain less popular encodings. Wordpad is included with Windows and may display the files properly.

Translation of a DOS/Windows text file to Unix is performed by removing the Carriage Return from the end of the line. The reverse is simply the addition of the Carriage Return to the Line Feed. Files that have double spaces between the lines may have been improperly translated from one system to another.

Text files usually have the .txt extension. A number of file formats is actually "text files", but bear different extensions. For example is web documents (HTML-files) text files but is written with a specific syntax so the applications the files are designed to work with can read i correctly. Other kinds of files that can be seen as text files are source code files, xml, etc.