Version 1 - History - Wiki - pdfindex

Wiki » History » Version 1

Chris Cannam, 2014-08-04 12:50 PM

-Chris Cannam
+h1. Wiki
 Chris Cannam
-Chris Cannam
+h2. Other tools / better ways to do this
 Chris Cannam
-Chris Cannam
+Tools that look like they might do text and/or metadata extraction from PDFs:
 Chris Cannam
-Chris Cannam
+ * "Apache Tika":http://tika.apache.org/ (text + metadata, Java)
-Chris Cannam
+ * "Grobid":https://github.com/kermitt2/grobid (biblio metadata, Java + native)
-Chris Cannam
+ * "Textract":http://datascopeanalytics.com/what-we-think/2014/07/27/extract-text-from-any-document-no-muss-no-fuss (text, Python wrapper for other utilities?)