Wiki¶
Other tools / better ways to do this¶
Tools that look like they might do text and/or metadata extraction from PDFs:
- Apache Tika (text + metadata, Java)
- Grobid (biblio metadata, Java + native)
- Textract (text, Node wrapper for other utilities?)
- Textract (text, Python wrapper for other utilities?)