Wiki » History » Version 1
Chris Cannam, 2014-08-04 12:50 PM
1 | 1 | Chris Cannam | h1. Wiki |
---|---|---|---|
2 | 1 | Chris Cannam | |
3 | 1 | Chris Cannam | h2. Other tools / better ways to do this |
4 | 1 | Chris Cannam | |
5 | 1 | Chris Cannam | Tools that look like they might do text and/or metadata extraction from PDFs: |
6 | 1 | Chris Cannam | |
7 | 1 | Chris Cannam | * "Apache Tika":http://tika.apache.org/ (text + metadata, Java) |
8 | 1 | Chris Cannam | * "Grobid":https://github.com/kermitt2/grobid (biblio metadata, Java + native) |
9 | 1 | Chris Cannam | * "Textract":http://datascopeanalytics.com/what-we-think/2014/07/27/extract-text-from-any-document-no-muss-no-fuss (text, Python wrapper for other utilities?) |