Document Metadata Extraction
Revision as of 20:21, 20 March 2009 by Simsong
Here are tools that will extract metadata from document files.
- Extracts metadata from various Microsoft Word files (doc). Can also convert doc files to other formats such as HTML or plain text.
- pdfinfo (part of the xpdf package) displays some metadata of PDF files.
These general-purpose programs frequently work when the special-purpose programs fail, but they generally provide less detailed information.
- Metadata Extraction Tool
- "Developed by the National Library of New Zealand to programmatically extract preservation metadata from a range of file formats like PDF documents, image files, sound files Microsoft office documents, and many others."
- The UNIX file program can extract some metadata