pdfindex

History

Wiki¶

Other tools / better ways to do this¶

Tools that look like they might do text and/or metadata extraction from PDFs:

Apache Tika (text + metadata, Java)
Grobid (biblio metadata, Java + native)
Textract (text, Node wrapper for other utilities?)
Textract (text, Python wrapper for other utilities?)