Wednesday, May 28, 2008

SSP 2008 - Digital Preservation Part 3

Radhika Nurati – Apex CoVantage

{Digitizing backfiles}

Business objectives: preservation/genealogy (Mormons etc)/research/access/revenue generation

Issues with semantics – digitization can bring together similar content called different things (India/ Britain conflict in 1800s called different things in literature of each country)

Image quality/ source pages (varying quality of filming for fiche etc)/ OCR quality are big issues.

Index images when digitizing.

Break projects into pieces: do things when you have money – think about future up front, mark things to come back to later like cleaning captions.

DTD/ Specifications:
*NLM DTD is standard
*Match publishing DTD and archival DTD to make things easier
*Item boundaries – what do you consider an item? Harder to tell in older journals – not broken up as well, more like paragraphs/mishmash of formats. Decide on boundaries and give guidance to digitizer.
*Item categorization: decide on article types – 50 types in NLM DTD, consistency and ease of use by users

Digitization phases: pilot – range of types of pubs and different time periods (variations of content)/ system test – let users see if content is usable, user analysis is good/ stead-state production

Labels: , ,


Post a Comment

<< Home