Jim Willis

Troubleshooting PDF OCR using Python on Mac

I wrote a script to extract some text from a PDF (image-based text, so pdftotext wouldn’t work).

Using pdf2image convert_from_path I simply could not get any data out of the pdf. I tried multiple PDFs while testing and convert_from_path just kept returning an empty variable.

Turned out that my homebrew install of xpdf was interfering with my homebrew install of poppler.

Uninstalling xpdf (brew uninstall xpdf) and reinstalling poppler (brew install poppler) seemed to fix things up. My suspicion is that they both come with their own versions of pdfinfo which is used by pdf2image. Just a hunch, I don’t know enough about what’s going on under the hood. So, anyway, if pdf2image isn’t working correctly for you and you’re on a Mac, make sure you’ve got poppler installed and that xpdf’s pdfinfo isn’t being used.

Posted

January 3, 2020

Uncategorized

Current Spins

Check out my album Set It All Down on your favorite streaming service.

Posts Worth Reading:

Free Internet vs Internet of Free Stuff

Letterboxd

Mickey 17, 2025
May 5, 2025
Couldn’t finish this one. Not bad, just not what we needed at the moment.
The Penguin Lessons, 2024 – ★★★½
May 5, 2025
Looking for a comedy, this wasn’t but it did treat a tough topic in an accessible way. Still, not funny especially in light of our current political climate.
Black Bag, 2025 – ★★★★
April 12, 2025
Great spy movie, loved the style and story.
Like Stars on Earth, 2007 – ★★★★½
March 9, 2025
(date approximate)
Eega, 2012 – ★★★★½
March 9, 2025
(date approx)
Ghostlight, 2024 – ★★★★½
January 22, 2025
I love a frame tale and this one is spectacular
3 Idiots, 2009 – ★★★★½
January 14, 2025
Amazzzzzzing. Nothing like it.
Kahaani, 2012 – ★★★½
January 2, 2025
On location India footage was transportive.
Eega, 2012 – ★★★★½
January 1, 2025
so much unexpected happening here. Perfect New Year’s Eve movie.
The Order, 2024 – ★★★★½
December 31, 2024
Jude Law. Wow. Powerful storytelling. Must see.

Reading Notes

Read: Manufactured Anxiety: How Self-Improvement Became a Self Destruct Sequence
December 9, 2024
Who profits from our constant state of dissatisfaction? The answer, of course, is painfully obvious. Every industry that sells a solution to a problem you […]
Read: My Friends Aren’t Reading
December 9, 2024
the shifts have been in place for awhile. A certain kind of book—say those reviewed in the NYRB—will become like opera, or theater, or ballet, […]
Read: Pema Chödrön’s Three Methods for Working with Chaos
December 9, 2024
• No more struggle: “Whatever arises, train again and again in seeing it for what it is. The innermost essence of mind is without bias. […]
Read: The Tech Coup
December 9, 2024
. The EU invokes a mechanism called the precautionary principle in cases where an innovation, such as GMOs, has not yet been sufficiently researched for […]
Read: Why Not Bluesky
December 9, 2024
The real problem, in my mind, isn’t in the nature of this particular Venture-Capital operation. Because the whole raison-d’etre of Venture Capital is to make […]

Saved Links

RSS Error: A feed could not be found at `https://links.jimwillis.org/feed/atom?`; the status code is `404` and content-type is `text/html; charset=utf-8`