AlL uR nEeDs!!!: Convert Scanned PDF Documents to Text with Google OCR

Convert Scanned PDF Documents to Text with Google OCR

0

There are two types of PDF documents – those created by sending
Office files, images, etc. to an Acrobat like PDF printer and those
created by scanning physical paper like pages of a book, legal
documents, etc.

Google could always index PDF documents created by conversion but now they also recognize text from PDFs that are generated by scanning paper documents using OCR software.

This is a scanned document and this is the html text view of that same document converted by Google.

Since scanned PDFs are nothing but images, don’t be surprised if
Google adds a "search by text" function to their Image Search engine
similar to OneNote or EverNote. That will surely be huge.

Convert Scanned PDFs to Text

Now if you have bunch of scanned PDF files on your hard drive and no OCR software, here’s what you can do to convert them into recognizable text.

Create a folder in your website (say abc.com/pdf) and upload all the
PDF images to that folder. Now create a public web page that links to
all the PDF files. Wait for the Google bots to spider your stuff.

Once done, type the query "site:abc.com/pdf filetype:pdf" to see the PDF documents as HTML.

Newly Added

Convert Scanned PDF Documents to Text with Google OCR

0

Convert Scanned PDFs to Text

Posted on : Monday, November 3, 2008 | By : Rajat | In : Tips, Tricks

One Response to "Convert Scanned PDF Documents to Text with Google OCR"

Write a comment

CaTeGoRiEs

~~Newly Added~~

Convert Scanned PDF Documents to Text with Google OCR

0

Convert Scanned PDFs to Text

Posted on : var ultimaFecha = 'Monday, November 3, 2008'; Monday, November 3, 2008 | By : Rajat | In : Tips, Tricks

One Response to "Convert Scanned PDF Documents to Text with Google OCR"

Write a comment

CaTeGoRiEs

Newly Added

Posted on : Monday, November 3, 2008 | By : Rajat | In : Tips, Tricks