Difference between pages "PDF" and "Windows SuperFetch Format"

From Forensics Wiki
(Difference between pages)
Jump to: navigation, search
 
(MEMO file)
 
Line 1: Line 1:
The '''Portable Document Format''' ('''PDF''') is a document format from [[Adobe]] Inc. It is widely available on the web. Originally developed as a propriety format, version 1.7 was released as an open standard in 2008. The standard is published as ISO/IEC 32000-1:2008. Although an open standard, Adobe still owns patents and copyrights related to the PDF standard. Adobe has granted a worldwide royalty-free license to produce PDF software, but only if the software complies with the PDF standard.
+
{{expand}}
  
== Format ==
+
== MEMO file ==
 +
Th MEMO file consists of:
 +
* file header
 +
* compressed blocks
  
It is a common misconception that PDF files are simply a collection of images, one per page.  Certainly a PDF can be formed that way (which is typical of document scanners), but in reality the document structure is much more complex.  A PDF file can contain text streams (which cam be encoded and/or compressed in dozens of ways), vector and raster images, fonts, and various interactive elements.
+
=== File header ===
A PDF file comprises sections called "objects." Each object is numbered and can represent a page, a font, a data stream, etc. Each file begins with the string <tt>%PDF</tt>. Each file ends with the letters <tt>%%EOF</tt>, but there can be multiple <tt>EOF</tt>'s in a single file (this often confuses programs like [[foremost]] that search for footers).
+
The file header is 84 bytes of size and consists of:
 +
{| class="wikitable"
 +
|-
 +
! Offset
 +
! Size
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| 4
 +
| 0x304D454D ("MEM0") or 0x4F4D454D ("MEMO")
 +
| Signature
 +
|-
 +
| 4
 +
| 4
 +
|
 +
| Uncompressed (total) data size
 +
|-
 +
|}
  
Adobe's Acrobat software supports "incremental updates." The standard allows this so that modifications can simply be appended to the file, leaving the original data intact. Any new or altered object is simply appended to the end of the original file. Deleted objects are left intact and simply marked deleted. This can potentially cause inadvertent disclosure of sensitive information.
+
=== Compressed blocks ===
 +
The file header is followed by compressed blocks:
 +
{| class="wikitable"
 +
|-
 +
! Offset
 +
! Size
 +
! Value
 +
! Description
 +
|-
 +
| 0
 +
| 4
 +
|
 +
| Compressed data size
 +
|-
 +
| 4
 +
| ...
 +
|
 +
| Compressed data
 +
|-
 +
|}
  
== Metadata ==
+
== See Also ==
 +
* [[SuperFetch]]
  
PDF metadata can be stored in a document information dictionary or as a metadata stream, sometimes both. A metadata stream can describe the entire document or an individual component of a document. Thus, multiple metadata streams may exist in a single document, making it difficult to find all of it. Metadata streams are stored in Adobe's XML based XMP (Extensible Metadata Platform) format. Even if a PDF document is encrypted, the accompanying metadata is not required to be, and often is not, encrypted.
+
== External Links ==
 
+
* [http://blog.rewolf.pl/blog/?p=214 Windows SuperFetch file format – partial specification]
The metadata (or parts of it) can be extracted with [[pdfinfo]], a utility which is part of the [[xpdf]] package.
+
 
+
== Embedded Objects==
+
 
+
The PDF standard supports embedding many types of files such as images. Embedded files may contain their own metadata. You can use [[pdfimages]], part of the [[xpdf]], to extract all of the images out of a PDF file and put each in its own file.
+
 
+
== Subformats ==
+
 
+
Several related standards exist that contain subsets or supersets of the PDF standard features. These standards include
+
 
+
* PDF/A a simpler set of features for archiving documents, allowing for long-term reproducibility. Some scanning software saves documents in PDF/A by default.
+
* PDF/X for graphic arts.
+
* PDF/UA for universal accessibility.
+
* PDF/E for engineering drawings.
+
 
+
==PDF Software==
+
 
+
Due to the popularity of the PDF format, there is much software available for viewing and creating PDF documents. However, Adobe maintains a de facto monopoly on software capable of editing PDF documents.  There are quite a few tools that merge or split pdf documents, but few that can make meaningful edits.  Software such as OpenOffice.org and Inkscape can import PDF files into their native formats, where the documents can be edited and then exported back to PDF. Unfortunately, this option can be quite cumbersome.
+
 
+
=== PDF Tools ===
+
These tools are useful for analyzing PDF files:
+
; Origami
+
: http://security-labs.org/origami/
+
: A powerful open source framework and GUI written in Ruby. It allows for parsing and exploring pdf files and graphically browsing its contents.
+
 
+
; PDF Tools
+
: http://blog.didierstevens.com/programs/pdf-tools/
+
: Didier Stevens' [http://blog.didierstevens.com/2008/10/30/pdf-parserpy/ pdf-parse] and pdfid, written in Python
+
 
+
; pdfresurrect
+
: http://www.757labs.com/projects/pdfresurrect/#downloads
+
: Retrieves previous versions of PDF files that have changes appended with "incremental updates"
+
 
+
; PDFMiner
+
: http://www.unixuser.org/~euske/python/pdfminer/index.html
+
: "Python PDF parser and analyzer"
+
: Includes '''pdf2txt.py''' command-line tool for extracting text from PDF files, and '''dumppdf.py''' for dumping PDF objects.
+
 
+
; pyPdf
+
: http://pybrary.net/pyPdf/
+
: "A Pure-Python library built as a PDF toolkit."
+
: Will encrypt and decrypt PDF files.
+
 
+
; QPDF
+
: http://sourceforge.net/projects/qpdf/
+
: Open source, cross-platform library and set of programs to inspect and manipulate PDF files. Packaged in recent Debian based distributions.
+
 
+
These tools are useful for manipulating and generating PDF files:
+
; ReportLab Open Source PDF Library
+
: http://www.reportlab.com/software/opensource/
+
: "our proven, industry-strength PDF generating software. Programmatically create any kind of PDF document"
+
 
+
= See Also =
+
 
+
* [[Arabic PDFs]]
+
* [[Tools:Document Metadata Extraction]]
+
 
+
== External Links ==  
+
 
+
* [http://partners.adobe.com/public/developer/pdf/index_reference.html Adobe PDF Reference]
+
* [http://en.wikipedia.org/wiki/PDF Wikipedia: PDF]
+
* [http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/ Portable Document Format: An Introduction for Programmers], MacTech Magazine, Volume 15, (1999), Issue 9
+
* [http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51502 ISO Standard]
+
* [http://partners.adobe.com/public/developer/support/topic_legal_notices.html Patent Licenses]
+
* [http://blog.didierstevens.com/2008/04/09/quickpost-about-the-physical-and-logical-structure-of-pdf-files/ Quickpost: About the Physical and Logical Structure of PDF Files], by Didier Stevens, April 9, 2008
+
  
 
[[Category:File Formats]]
 
[[Category:File Formats]]

Revision as of 12:37, 14 April 2014

Information icon.png

Please help to improve this article by expanding it.
Further information might be found on the discussion page.

Contents

MEMO file

Th MEMO file consists of:

  • file header
  • compressed blocks

File header

The file header is 84 bytes of size and consists of:

Offset Size Value Description
0 4 0x304D454D ("MEM0") or 0x4F4D454D ("MEMO") Signature
4 4 Uncompressed (total) data size

Compressed blocks

The file header is followed by compressed blocks:

Offset Size Value Description
0 4 Compressed data size
4 ... Compressed data

See Also

External Links