SSP 2008 - Digital Preservation Part 3
Radhika Nurati – Apex CoVantage
Business objectives: preservation/genealogy (Mormons etc)/research/access/revenue generation
Issues with semantics – digitization can bring together similar content called different things (India/ Britain conflict in 1800s called different things in literature of each country)
Image quality/ source pages (varying quality of filming for fiche etc)/ OCR quality are big issues.
Index images when digitizing.
Break projects into pieces: do things when you have money – think about future up front, mark things to come back to later like cleaning captions.
*NLM DTD is standard
*Match publishing DTD and archival DTD to make things easier
*Item boundaries – what do you consider an item? Harder to tell in older journals – not broken up as well, more like paragraphs/mishmash of formats. Decide on boundaries and give guidance to digitizer.
*Item categorization: decide on article types – 50 types in NLM DTD, consistency and ease of use by users
Digitization phases: pilot – range of types of pubs and different time periods (variations of content)/ system test – let users see if content is usable, user analysis is good/ stead-state production