A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format. olmOCR-Bench: We also ship a comprehensive benchmark ...