OCR depends soooo much on the quality of an original. If it is a PDF it seems
to come over OK most of the time. However If there is a background image or
watermark that can mess it up. If the original is a fax it gets it wrong a
lot. A clean unfolded laser-printed document in Times New Roman comes over
pretty close to 100%
I think there are two ways PDFs store text - as text streams or as images. It
seems the text streams OCR better.