Wiki » History » Version 1

Chris Cannam, 2014-08-04 12:50 PM

1 1 Chris Cannam
h1. Wiki
2 1 Chris Cannam
3 1 Chris Cannam
h2. Other tools / better ways to do this
4 1 Chris Cannam
5 1 Chris Cannam
Tools that look like they might do text and/or metadata extraction from PDFs:
6 1 Chris Cannam
7 1 Chris Cannam
 * "Apache Tika":http://tika.apache.org/ (text + metadata, Java)
8 1 Chris Cannam
 * "Grobid":https://github.com/kermitt2/grobid (biblio metadata, Java + native)
9 1 Chris Cannam
 * "Textract":http://datascopeanalytics.com/what-we-think/2014/07/27/extract-text-from-any-document-no-muss-no-fuss (text, Python wrapper for other utilities?)