Digital library hits 1.5M-book mark


Carnegie Mellon University is behind the digitization project.

PITTSBURGH (AP) — Nearly a decade ago, computer scientists at Carnegie Mellon University embarked on a project with an astonishingly lofty goal: digitize the published works of humankind and make them freely available online.

The architects of the project said Tuesday they have surpassed their latest target, having scanned more than 1.5 million books — many of them in Chinese — and are continuing to scan thousands more daily.

“Anyone who can get on the Internet now has access to a collection of books the size of a large university library,” said Raj Reddy, a computer science and robotics professor at the university who initiated the project.

The latest phase in the development of the so-called Universal Library, the Million Book Project, began in 2002 after Reddy’s team successfully scanned 1,000 books.

Much of the recent work has been carried out by people at scanning centers in India and China, helped by $3.5 million in seed funding from the National Science Foundation and in-kind contributions from computer hardware and software makers.

The governments of China and India each have paid for 500 people to scan books in their respective countries for the duration of the project — a contribution valued at roughly $10 million each, according to Reddy.

Partners in the project include Zhejiang University in China, the Indian Institute of Science in India and the Library at Alexandria in Egypt. European institutions so far have declined to participate in the project.

At least half the books are out of copyright or scanned with the permission of copyright holders. Excerpts of copyright-protected works are available, though organizers expect complete texts to become available eventually.

The project is not unique. Online search engine operator Google Inc. and software giant Microsoft Corp. have begun similar endeavors, though Carnegie Mellon representatives say theirs is the largest university-based digital library of free books and that its purpose is noncommercial.

It’s a step toward the creation of an online library that would make traditionally published books and other content available to all, Reddy said. “The economic barriers to the distribution of knowledge are falling,” he said in a statement.

Based on catalogs and bibliographies, Carnegie Mellon experts estimate about 100 million books have been published since the dawn of time, said Michael Shamos, a computer science professor and copyright lawyer working on the project.

The library’s mission includes preserving rare and decaying texts and providing a resource for distance learning, among other things, he said. And organizers hope it will ultimately encompass music, artwork, newspapers and other content, Shamos added.

“It’s important for this to be free,” he said. “In the United States, yes, you can get people to pay $10 or $20 a month for some kind of service. ... But there are other countries where $10 or $20 a month is their entire income.”

Books scanned so far have been borrowed from libraries or donated by philanthropic individuals, in some cases shipped across oceans, he said.

The library currently contains books published in 20 languages, including 970,000 in Chinese, 360,000 in English, 50,000 in the southern Indian language of Telugu and 40,000 in Arabic.

Text appears only in the language in which it was published, but translation technology may eventually provide it in the language of the user’s choosing.

Reddy, the project’s originator, said he favored a single portal for books digitized by Carnegie Mellon and others such as Google.

“I think it will happen,” he said. “Give it two or three years.”