Digitizing Old Text and Fighting Spam, Too

by Erik Larson | August 15, 2008 at 02:00 pm
156 views | 0 Recommendations | 1 comment

Photos

Digitizing Old Text and Fighting Spam, Too

Digitizing Old Text and Fighting Spam, Too

see larger image

uploaded by Erik Larson

By Phil Berardelli
ScienceNOW Daily News
12 August 2008

The next time a Web site asks you to read a string of crooked letters as a security precaution, don't grimace. You could be helping to digitize a deteriorating historical document. A team of computer scientists has taken a common Internet tool for screening out spam and adapted it to help convert text from old books and manuscripts into electronic files. The effort might not put professional transcribers out of business, but it could cut the cost of creating digital libraries.

In the battle between Web security designers and spammers, programs called Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) have proven an effective foil. The programs require online users to read a distorted word or line of text and retype it in a designated box--something that few optical scanners or digital-text readers can do. Insidious programs deployed by spammers can penetrate sites such as Gmail and lift their e-mail address lists. CAPTCHAs block the attempt by requiring an extra step before providing access. They are used online about 200 million times every day.

Computer scientist Luis von Ahn of Carnegie Mellon University in Pittsburgh, Pennsylvania, and colleagues thought all that effort could be put to another use, too. "Since each [CAPTCHA] takes about 10 seconds of human time," von Ahn says, "we figured humanity as a whole was wasting about 500,000 hours every day typing." And that much time constituted a valuable resource in efforts to digitize old books with deteriorating pages and faded text.


NowPublic uses CAPTCHA and is all about serving the public interest; as a user, I'll be extra happy to input those pesky characters if it means I'm helping preserve history and broaden/deepen the human knowledge base!

Advertisement
recommend This comment thread is now closed
0
Heiky

Spam sometimes still passes through the system though! So annoying.

This story was created over 3 months ago, the comment thread is now closed.

closeSign in to NowPublic

is reporting from