NP Rank:
Digitizing Old Text and Fighting Spam, Too
By Phil Berardelli
ScienceNOW Daily News
12 August 2008The next time a Web site asks you to read a string of crooked letters as a security precaution, don't grimace. You could be helping to digitize a deteriorating historical document. A team of computer scientists has taken a common Internet tool for screening out spam and adapted it to help convert text from old books and manuscripts into electronic files. The effort might not put professional transcribers out of business, but it could cut the cost of creating digital libraries.
In the battle between Web security designers and spammers, programs called Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) have proven an effective foil. The programs require online users to read a distorted word or line of text and retype it in a designated box--something that few optical scanners or digital-text readers can do. Insidious programs deployed by spammers can penetrate sites such as Gmail and lift their e-mail address lists. CAPTCHAs block the attempt by requiring an extra step before providing access. They are used online about 200 million times every day.
Computer scientist Luis von Ahn of Carnegie Mellon University in Pittsburgh, Pennsylvania, and colleagues thought all that effort could be put to another use, too. "Since each [CAPTCHA] takes about 10 seconds of human time," von Ahn says, "we figured humanity as a whole was wasting about 500,000 hours every day typing." And that much time constituted a valuable resource in efforts to digitize old books with deteriorating pages and faded text.
NowPublic uses CAPTCHA and is all about serving the public interest; as a user, I'll be extra happy to input those pesky characters if it means I'm helping preserve history and broaden/deepen the human knowledge base!
NowPublic on Facebook
Crowd Power
-
Erik Larson
Washington, District Of Columbia, United States




Most RecentMost Recommended Comments (1)
at 14:10 on August 15th, 2008
Spam sometimes still passes through the system though! So annoying.