Wiki

Other tools / better ways to do this

Tools that look like they might do text and/or metadata extraction from PDFs:

  • Apache Tika (text + metadata, Java)
  • Grobid (biblio metadata, Java + native)
  • Textract (text, Node wrapper for other utilities?)
  • Textract (text, Python wrapper for other utilities?)