Anyone who has filled out an online form has surely encountered a CAPTCHA, or Completely Automated Public Test for Telling Computers and Humans Apart. This is the test that asks you to type the words you see in the box to prove you are a real person and not an evil bot programmed to generate spam. While this process may seem tedious, one group has gone a step further to make it useful for the greater good.
There are many CAPTCHA services available, but the one used most prevalently is across the web is reCAPTCHA. This service was originally created by some people at Carnegy Mellon University before being acquired by Google in 2009.
Over the years, Google has been in the business of digitizing books. To achieve this, Google has hired many people to scan book after book after book, one page at a time.
With every page that is scanned, special optical character recognition (OCR) software is used to read and digitize every word, essentially pulling the text from the paper it’s printed on. OCR, however, can only do so much. Sometimes pages and words are scanned awkwardly, causing the text to twist, skew, etc, making it too difficult for OCR to work.
For words that OCR cannot recognize, human intervention is required, but instead of paying people to make guesses on what mangled words actually say, the brilliant minds behind reCAPTCHA came up with a better solution. People all around the world are already using reCAPTCHA to prove they are human, why not have them decipher it for us?
You may have noticed that when you encounter a reCAPTCHA challenge, there are two words in the box that you have to type out. Google’s servers already know what one of the words means. The other word is one that OCR could not recognize.
What reCAPTCHA does is ask users to type in what they think the garbled word actually says, along with the word Google knows. Since CAPTCHAs are filled out by millions of people, this allows the service to ask a vast group what they think any one word says. And if they typed the other word correctly, it can be assumed that the other word is right (or at least a close guess), too.
Once the reCAPTCHA service compiles enough of these guesses it can determine with high accuracy what the word is. After that it will be considered digitized. So, not only are you proving that you are human, you are actually helping to digitize books!
You may have noticed recently that Google has used the reCAPTCHA project for other tasks where OCR fails. For instance, you may have been asked to type in a house number or a street sign captured by a Google StreetView camera.
Just like with books, reCAPTCHA is using the power of the people to improve its Maps and navigation software, which in turn benefits everyone using those services.
Duolingo, a free online service for learning another language, follows the same premise as reCAPTCHA. Users practice by translating phrases and sentences from real websites around the world, so as individuals learn a new language, they also help to translate the web!