How to Build a Digital Library ( 2 )

Publication series :2

Author: Witten   Ian H.;Bainbridge   David;Nichols   David M.  

Publisher: Elsevier Science‎

Publication year: 2009

E-ISBN: 9780080890395

P-ISBN(Paperback): 9780123748577

P-ISBN(Hardback):  9780123748577

Subject: TP2 自动化技术及设备;TP39 computer application

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Description

How to Build a Digital Library reviews knowledge and tools to construct and maintain a digital library, regardless of the size or purpose. A resource for individuals, agencies, and institutions wishing to put this powerful tool to work in their burgeoning information treasuries.

The Second Edition reflects developments in the field as well as in the Greenstone Digital Library open source software. In Part I, the authors have added an entire new chapter on user groups, user support, collaborative browsing, user contributions, and so on. There is also new material on content-based queries, map-based queries, cross-media queries. There is an increased emphasis placed on multimedia by adding a "digitizing" section to each major media type. A new chapter has also been added on "internationalization,"  which will address Unicode standards, multi-language interfaces and collections, and issues with non-European languages (Chinese, Hindi, etc.).

Part II, the software tools section, has been completely rewritten to reflect the new developments in Greenstone Digital Library Software, an internationally popular open source software tool with a comprehensive graphical facility for creating and maintaining digital libraries.

  • Outlines the history of libraries on both traditional and digital
  • Written for both technical and non-technical audiences and covers the entire spectrum of media, including text, images, audio, video, and related XML standard

Chapter

1.2 The Changing Face of Libraries

In the beginning

The information explosion

The Alexandrian principle

Early technodreams

The library catalog

The changing nature of books

1.3 Searching for Sophocles

1.4 Digital Libraries in Developing Countries

Disseminating humanitarian information

Disaster relief

Preserving indigenous culture

Locally produced information

The technological infrastructure

1.5 The Pen Is Mighty: Wield It Wisely

Copyright law

The public domain

Relinquishing copyright

Digital rights management

Copyright and digitization

Collecting from the Web

Illegal and harmful material

Cultural sensitivity

1.6 Planning a Digital Library

1.7 Implementing a Digital Library: The Greenstone Software

1.8 Notes and Sources

Chapter 2 People in digital libraries

2.1 Roles

Global users

Roles of librarians

Change

2.2 Identity

Anonymous use

Authenticated use

Recording usage data

2.3 Help and User Support Services

2.4 Working with Digital Collections

Using information from digital libraries

Referring to objects in a digital library

Berry-picking

2.5 User Contributions

Annotations

Keywords

Ratings

Corrections

New documents

Partial and fluid documents

2.6 Notes and Sources

Chapter 3 Presentation

From People to Presentation

3.1 Presenting Textual Documents

Documents, chapters, sections

Unstructured text documents

Page images

Images with text

Realistic books

3.2 Presenting Multimedia Documents

Sound and pictures

Video

Music

3.3 Document Surrogates

Metadata

Multimedia surrogates

3.4 Searching

Types of queries

Case-folding and stemming

Phrase searching

Query interfaces

Searching multimedia

Searching music

Searching images

3.5 Metadata Browsing

Lists

Dates

Hierarchies

Facets

3.6 Putting It All Together

An institutional repository

3.7 Notes and Sources

Chapter 4 Textual documents

4.1 Representing Textual Documents

ASCII

Unicode

Plain text

Indexing

Word segmentation

4.2 Textual Images

Scanning

Optical character recognition

Acquisition, cleanup, and page analysis

Recognition

Checking and saving

Page handling

Planning an image digitization project

Inside an OCR shop

An example project

4.3 Web Documents: HTML and XML

Markup and stylesheet languages

Basic HTML

Using HTML in a digital library

Basic XML

Parsing XML

Using XML in a digital library

4.4 Presenting Web Documents: CSS and XSL

CSS

Cascading style sheets

Context- and media-dependent formatting

Extensible stylesheet language

Using Formatting Objects

Context- and media-dependent formatting

Processing in XSL

4.5 Page Description Languages: PostScript and PDF

PostScript fundamentals

The language

Evolution

Encapsulated PostScript

Fonts

Font formats

Composite fonts

Compatibility with Unicode

Text extraction

A simple text extraction program

Improving the output

Using PostScript in a digital library

Portable Document Format: PDF

Inside a PDF file

Features of PDF

Linearized PDF

Security and PDF documents

PDF and PostScript

4.6 Word-Processor Documents

Rich Text Format: RTF

Basic types

Backward compatibility

File structure

Other features

Using RTF in a digital library

Native Word formats

Using native Word in a digital library

Office Open XML: OOXML

Open Document format: ODF

Open Document files

Formatting

Using ODF in a digital library

Scientific documents: LaTeX

Using LaTeX in a digital library

4.7 Other Documents

Spreadsheets and presentation files

E-mail

4.8 Notes and Sources

Chapter 5 Multimedia

5.1 Introducing Compression and Transforms

Basic compression techniques

Transforms

The Fourier transform

5.2 Audio

Pulse code modulation: PCM

Variants of PCM

Early formats: WAV, AIFF, AU

MPEG audio: MP3 and its siblings

Post-MP3 formats: AAC, Ogg Vorbis, FLAC

Replaying audio

An audio digital library

5.3 Images

Lossless compression: GIF and PNG

Lossy compression: JPEG

Progressive refinement

Archiving images: JPEG 2000 and TIFF

A digital library of photographs

Vector graphics images

5.4 Video

Codecs

Multimedia compression: MPEG

Inside MPEG

MPEG-1

Mixing media

MPEG-2

MPEG-4

High Definition Digital Television

Proprietary formats

Streaming

Ogg Theora

Using multimedia in a digital library

A video digital library

Reflection

5.5 Rich Media

Synchronized Multimedia Integration Language: SMIL

Adobe Flash

5.6 Music

Musical Instrument Digital Interface: MIDI

Channel Events

Meta Events

System Exclusive Events

Digital music libraries

5.7 Notes and Sources

Audio

Images

Video

Rich Media

Music

Chapter 6 Metadata

6.1 Characteristics of Metadata

6.2 Bibliographic Metadata

MARC

MARCXML

Dublin Core: DC

Qualified Dublin Core

Metadata Object Description Schema: MODS

BibTeX

EndNote

6.3 Metadata for Multimedia

Image metadata: TIFF

Image metadata: EXIF, XMP, IPTC, and MIX

Audio metadata

Video metadata

Multimedia metadata: MPEG-7

Multimedia application metadata: MPEG-21

6.4 Metadata for Compound Objects

Resource Description Framework: RDF

Metadata Encoding and Transmission Standard: METS

Collection-level metadata

Open Archives Initiative Object Reuse and Exchange: OAI-ORE

Metadata for education: LOM and SCORM

Metadata for eResearch

6.5 Metadata Quality

Authority control: Names

Authority control: Subjects

Controlling metadata values

Metadata tools

6.6 Extracting Metadata

Extracting document metadata

Generic entity extraction

Bibliographic references

Language identification

Acronym extraction

Key-phrase metadata

Key-phrase extraction

Key-phrase indexing

6.7 Notes and Sources

Chapter 7 Interoperability

7.1 Z39.50 Protocol

7.2 Open Archives Initiative

OAI Protocol for Metadata Harvesting: OAI-PMH

Serving

Harvesting

7.3 Object Identification

Handles

Digital object identifiers: DOIs

OpenURLs

Persistence

7.4 Web Services

Search/Retrieval via URL: SRU

7.5 Authentication and Security

7.6 DSpace and Fedora

DSpace

Fedora

7.7 Notes and Sources

Chapter 8 Internationalization

8.1 Multilingual interfaces and documents

8.2 Unicode

Composite and combining characters

Unicode character encodings

UTF-32

UTF-16

UTF-8

Using Unicode in a digital library

8.3 Hindi and indic scripts

ISCII: Indian Script Code for Information Interchange

Unicode for Indic scripts

Problems with the adoption of Unicode

8.4 Word segmentation and sorting

Segmenting words

Segmenting words in Thai/Khmer/Lao

Sorting Chinese text

8.5 Notes and sources

Chapter 9 Visions

9.1 Libraries of the future

Today’s visions

Tomorrow’s visions

Working inside the digital library

9.2 Preserving the past

The problem of preservation

A sorry tale

The Domesday Project

Demise

Resurrection

The digital dark ages

Preservation strategies

Preservation in practice

The Internet Archive

9.3 Trends in digital libraries

Mobility: Portable collections

Knowledge-based information retrieval

9.4 Digital libraries for oral cultures

9.5 Notes and sources

Part II Greenstone Digital Library Software

Chapter 10 Building collections

10.1 The Reader’s Interface

The Greenstone digital library

Exploring the Demo collection

Browsing

Searching

Preferences

10.2 The Librarian Interface

Users and functions

A walk-through

Getting started

Assembling the source material

Enriching the documents

Designing the collection

Building the collection

Formatting the pages

Previewing

Help

10.3 Working with Documents

HTML documents

Word and PDF files

Enriching with metadata

Designing the collection

Changing the format

Enhanced Word document handling

Extracting document structure

Detecting user-defined styles

Extracting document properties

Enhanced PDF document handling

Trouble-shooting PDF collections

Switching modes in the Librarian interface

Splitting PDF documents into sections

Converting PDF documents to page images

Working with mixed PDF collections

Highlighting search terms

Enhanced HTML document handling

Extracting metadata

Extracting document structure

Metadata for hierarchical documents

Scaling up

Examining different file types

Adding metadata

Adding a hierarchy classifier

Partitioning the index

Viewing and refining the collection

10.4 Formatting

The Format panel

Format Features

Default format statements

Format strings

Conditionals

Referring to document nodes and multiple-valued metadata

Predefined items

Formatting exercise 1: Tudor collection

Pointing to documents on the Web

Formatting exercise 2: Word and PDF collection

Formatting exercise 3: Branding your collection

10.5 Dealing with Metadata

The Enrich panel

Metadata sets

Reviewing assigned metadata

Importing documents with assigned metadata

How metadata is stored

Collections of bibliographic information

Building the collection

Using the metadata

Searching

Controlling the conversion

Working with individual metadata records

Exploding the MARC database

Tidying up the search facility

Formatting the metadata records

Combining metadata and source documents

10.6 Non-Textual Documents

Images

Changing the thumbnail size

Adding metadata

Changing the format to view the new metadata

Adding a classifier and index

Textual images

Structure of the newspapers

Building a collection

Grouping documents by series

Displaying scanned images

Searching at page level

Adding new documents

Controlling document processing

Switching between images and text

Multimedia

Exploring the Beatles collection

Building the collection

Manually correcting metadata

Browsing by media type

Compacting the titles

Improving the formatting

Using UnknownPlugin

Add phrase and collage browsers

Customizing the appearance

10.7 Learning More

Sources of information

The user community

When things go wrong …

Chapter 11 Operating and interoperating

11.1 Inside Greenstone

Updating the software

Files and folders

Collections

Greenstone CD-ROM/DVDs

11.2 Operational Aspects

Configuration files

Logging

Administration facility

Authentication

Protecting a collection

11.3 Command-Line Operation

Getting started

Making a framework

Importing documents

Building indexes

Installing the collection

11.4 Under the Hood

Importing and building

Incremental building

Scheduled rebuilding

Archive formats

Document identifiers

Plug-ins

HTMLPlugin (.htm, .html; also .shtml, .shm, .asp, .php, .cgi)

WORDPlugin (.doc) and RTFPlugin (.rtf)

PDFPlugin (.pdf)

PostScriptPlugin (.ps)

ImagePlugin (.jpg, .jpeg, .gif, .png, .bmp, .xbm,.tif, .tiff)

TextPlugin (.txt, .text)

EmailPlugin (.email)

ZIPPlugin (.gz, .z, .tgz, .taz, .bz, .zip, .jar, .tar)

NulPlugin (.nul)

ISISPlug (.mst)

Search indexes

Adding and configuring indexes

Partitioning indexes

Experimenting with MGPP and Lucene

11.5 Interoperating

Downloading Web sites

Metadata protocols

Serving OAI

Exporting collections

Interoperating with DSpace

11.6 Distributed Operation

Remote Librarian interface

Setting up the GLI Server

Using the GLI Client

Using the GLI Applet

Institutional repositories

11.7 Large-Scale Usage

Limitations of the Librarian interface

Large collections

A very large collection

Full-text index

Metadata database

Image server

Building the collection

Distributed serving

Chapter 12 Design patterns for advanced user interfaces

12.1 Format Statements and Macros

Format statements

Macros

Commonly used macros

12.2 Design Patterns

Design pattern 1: Additional static pages

Design pattern 2: Using JavaScript to adjust presentation

Design pattern 3: Making formats statements reusable through macro definitions

Design pattern 4: Dynamic HTML

Opening and closing tables interactively

Adding table headers

Design pattern 5: Exploiting Asynchronous JavaScript and XML (AJAX)

Server-side checksums

Digital music stand

12.3 The Greenstone Research Project

Research with Greenstone3

Reconciling research and production values

Closing words

Glossary

References

Index

The users who browse this book also browse