Researchers are now reporting on a successful way to identify the words that computers can't handle: turn them into CAPTCHAs, and get people to do the work...
Scanned text is subjected to analysis by two optical character recognition programs; in cases where the programs disagree, the questionable word is converted into a CAPTCHA. It, along with a control word of known identity (used for cases where a bot is trying to crack the CAPTCHA) are then distributed to participating websites...
Each OCR software program managed about 84 percent accuracy but, when their results were combined with the reCAPTCHA system, the overall accuracy shot up to 99.1 percent.
Sunday, August 17, 2008
Recovering text from damaged historical manuscripts with CAPTCHAs: