Document Metadata Extraction
Revision as of 21:40, 5 October 2008 by Simsong
Here are tools that will extract metadata from document files.
- Extracts metadata from various Microsoft Word files (doc). Can also convert doc files to other formats such as HTML or plain text.
- pdfinfo (part of the xpdf package) displays some metadata of PDF files.
- Part of xpdf, this program will strip all of the images out of a PDF file and put each in its own file.
These general-purpose programs frequently work when the special-purpose programs fail, but they generally provide less detailed information.
- The UNIX file program can extract some metadata