Change The Letters In Your Search

Tip – Change the Letters in Your Search

If you do newspaper research online as part of your searching for history or for genealogy, then you have certainly been puzzled by some of the search results (or lack thereof) that you have received.

Creation of newspaper images and application of the OCR process does not always result in what you might expect.

There is a simple explanation for these issues, and it all has to do with quality:

  • Quality of the original material – was the newspaper old and brittle when scanned?  Was it yellowed?  Did it have dirt on it, or creases, or lots of ink spots?
  • Was the scan performed to create the digital image and the index from the original paper, or from a microfilm of the paper, or worse a copy of the microfilm, or even a copy of a copy?  Every additional copy or scan degrades the resulting image and when the OCR process is applied the index suffers.
  • Quality of the OCR software – some are better than others
  • Quality of the writing in the original newspaper.  Did the author get the person’s name spelled correctly, or was the event misspelled?
  • Quality of the typesetter – did the typesetter get every author’s words set up correctly?

Thus, what you are searching is not a perfect digital index that represents what was originally written by the author and newspaper publisher.

Here is an example of the OCR results from the Monticello Express newspaper from 1902 – about 115 years old, and below it the image from the newspaper.  The results are actually decent for that age, but if you check closely, there are many letters that are incorrect:

What can we do about it?  There are lots of things to try and this article deals with changing the letters in your search criteria.  For example – if the surname you are searching for is “Wilson” and the letter “n” is often picked up by the OCR process as the letter “m” why not search for “Wilsom”?

I guarantee that changing your search criteria will lead to an improvement of about 5 to 10% in search results.  I heard from one associate that changing word pairs got them a 20% improvement. That is an unusual occurrence however.

So, what word pairs are often confused?

  • rn and m  (ar n and em)
  • h and b
  • Capital D and O
  • i, l, 1, /, !, and I are all often interchanged
  • 0 and O
  • e and o
  • c and e
  • r and n
  • [, ] and l (el)
  • nl and m  (en el and em)
  • Capital R and B
  • n and ri  (en and ar eye)
  • v and y
  • Capital S and 8
  • Capital S and 5
  • Capital Z and 2
  • Capital G and 6
  • Capital B and 8
  • Capital K and |<
  • … and many more…

Some of the letters above can be “changed” just by virtue of an ink spot.  For example, an “h” to a “b”, or a “v” to a “y”, and an “r” to an “n” can be processed differently if an ink spot appears in just the right place.

My suggestion?  Change your search criteria and exchange the letter string you are looking for to include these alternative letter and letter pairs and see what happens.  That may mean several additional searches, but it is definitely worth the added effort. You might be pleasantly surprised!