Creating a searchable PDF with opensource tools ghostscript, hocr2pdf and tesseract-ocr

I bet creating searchable PDFs has been done many times over, even so I'd like to share the way I did it recently with strictly open source tools. The pipeline is simple: GS to separate the PDF to pages, tesseract OCR to extract text, hocr2pdf to create a merged PDF and GS again to bundle everything back to unified PDF. If you're creating a PDF from scanned books, this project may also be of help: unpaper

Edit 5/21/2014: I've had good experience using Scantailor, which is available on homebrew for the Mac. And also, I've submitted hocr2pdf to homebrew as part of the exact-image library (the name of the formula is "exact-image").
Continue reading "Creating a searchable PDF with opensource tools ghostscript, hocr2pdf and tesseract-ocr"

Share

Vertex array objects with shaders on OpenGL 2.1 & GLSL 1.2 [w/code]

rect3826Phew. Finally this is working!

I've been confined to OpenGL 2.1 and GLSL 1.2 on the Mac since the Qt OpenGL context will not pick up the core OpenGL profile (a big problem on it's own) and get an OpenGL 3.x and GLSL 1.5... So it's back to old school GL'ing, but anyway some things are working, albeit they have their quirks.
So for all of you battling the OpenGL 2.1 war, here's how I made VAOs work with a very simple shader.
Continue reading "Vertex array objects with shaders on OpenGL 2.1 & GLSL 1.2 [w/code]"

Share