Building the Digital Collection

The 59 diary volumes, 82 photographs and works of art, 43 contemporary maps, and seven trail guides found in Trails of Hope were digitally created at the Digital Imaging Center, Harold B. Lee Library, Brigham Young University; the Digital Technologies Department, Marriott Library, University of Utah; Special Collections, University of Nevada, Reno, Library; the Church Archives of the Church of the Jesus Christ of Latter-day Saints; the Utah State Historical Society; and the Churchill County Museum, Fallon, Nevada. In all ca. 7000 images of trail diaries and letters, 2350 images of trail guides, 82 images of photographs, paintings or prints, and 43 map images were scanned for the project.

Diaries and Letters

The original diaries and letters from 1846-1869 are the centerpiece of Trails of Hope. The Lee Library, Digital Imaging Center, Brigham Young University scanned 53 diary volumes, and the Digital Technologies Department, Marriott Library, University of Utah, scanned the remaining six volumes.

The Lee Library diary and letter scanning was done on a variety of flatbed scanners using various scanning software packages: Agfa Arcus II and DuoScan scanners using FotoLook software, a Heidelberg Opal Ultra scanner using Lincolor Elite software, an Epson Expression 836XL scanner using SilverFast software, and an Eversmart Jazz+ scanner using Eversmart Scan software. In order to produce more consistent files sizes and achieve greater efficiency in workflow, while still retaining sufficient detail in the page image, pages were scanned to a fixed number of pixels across the longest dimension of the page rather than to a set number of pixels per inch.

The pixel-dimension chosen was 2500 pixels across the long dimension at a bit depth of 24. This produced archival master files of approximately 15 Mb as uncompressed TIFF images and was judged to capture all relevant data for text pages. Scanned images were adjusted in Photoshop-minor enhancements for faded ink or pencil, cropping, and adjusting color balance where necessary-and a Photoshop batch process was run to add a black background with attribution information at the foot of each image. This batch process was done for all images in the collection except maps. The legibility of the manuscript pages was the highest priority for any image enhancements. As a rule, blank diary pages were not scanned.

Following scanning at the Lee Library, the files were checked against the diary for completeness and archival TIFF masters were burned to CD-R. One set of reference files at 150 dpi and another set at 800 pixels in width were created for each page. The first set of derivatives was made for use by the transcription team. The second set of reference files (800 pixels) was created for web display. Thumbnails were only created as a reference for each major object, i.e., diaries, photographs, maps, or trail guides, and not for individual pages of the diaries and trail guides.

The remaining six diaries were scanned at the J. Willard Marriott Library, the University of Utah, using a Leica S1 Pro Digital Camera, with Hasselblad lenses and SilverFast software. The camera used a capture resolution of 5140 x 5140 pixels, resulting in uncompressed TIFF 24-bit images of a maximum of 75 Mb. Reference files were shipped to Brigham Young University along with the text transcriptions for XML encoding.

Transcription and Mark-up of the Diaries

Diary transcriptions and mark-up techniques proved to be a long, and evolutionary process, and one of the most interesting learning experiences in the entire project. As had been a long-standing procedure in the Lee Library Special Collections, transcriptions were made by keying the text into WordPerfect text files that were proofed against the original diaries. This procedure followed written guidelines using traditional archival principles and practices that maintain the integrity of the original diary. The transcriptions preserved the original spelling, line lengths, insertions and deletions. Changes of ink, the presence of disfiguring elements, and missing or blank pages were recorded in end notes. Fifty-three diary volumes and letters were transcribed at Brigham Young University. The remaining six diary volumes were transcribed at the Marriott Library, using the same guidelines, and were sent to Brigham Young University for mark-up.

An innovation in the transcription procedure for this project involved using the scanned page images as the source for the transcriptions rather than the original diaries. The transcribers used four personal computers with 21-inch monitors to key in the WordPerfect text files. A set of references files at 150 dpi were made for the transcription team and burned to a CD-R. These page images were much more useful for transcription than the original diaries because of the magnification and image enhancement capabilities inherent in the digital images. The team, using multiple windows, could then create side-by-side viewing of the original diary page and the transcription. Any anomalies were always re-checked against the original diaries where necessary.

Another step, tried and abandoned, was to normalize spellings of personal names and place names in the WordPerfect transcription. The method developed was to key the normalized spellings into brackets in the transcription text with the expectation that this method would enhance searching capabilities. This process created difficulties in the line length and therefore corrupted the integrity of the original diary. We soon discovered that this process was not scalable or sustainable.

The power of XML was discovered part way through the transcription process, when the XML mark-up team, consisting of a group of six highly motivated students under the leadership of the project manager, began to utilize the Text Encoding Initiative (TEI). The project manager developed the structure for diary mark-up and, together with the team, implemented the mark-up process from July 2000 until March 2002. The mark-up structure was devised using elements from the TEI Xlite document type definition (DTD). The mark-up scheme divides the diary into the following possible major sections: front-matter, accounts, genealogy, letter, personal history, dated diary entries--grouped by year, month/year and day/month/year-extraneous material and notes.

To increase the level of historical detail that would be available for searching by users, XML encoding targeted the following. All significant personal names, geographic names and other names were tagged and regularized with the <orig> tag. Various sources for normalizing the names were employed, but the Library of Congress name and subject authority files were used to the extent possible. Each dated entry was also encoded with a <ref> tag that assigned any of ten broad topical terms (commerce, children, death, discipline, diseases, food, Indian encounters, Mormons-religious life, religious life, and women) to the entry. These topical headings were developed in conjunction with the Curator and the head of the Metadata cataloging unit.

As the project developed, we created extensive thesauri of personal names, place names and topical terms that will facilitate the building of searchable indices. A fill-in-the-blanks template for building the TEI headers was tried, but as the team gained greater facility in mark-up, data derived from the MARC record and the metadata for the electronic files was pasted directly into the XML files. The XML mark-up was done in both WordPerfect 9.0, (with its built in XML editor), and Altova GmbH's XML Spy (3.5 and 4.3). The mark-up included an extensive re-proofing of the transcriptions against the page-image originals, but undoubtedly some errors of transmission will remain.

PDF files available to users were also created from the TEI XML files using XSL transformations, via an HTML intermediate that was written to the PDF format. Because of the nature of the PDF format, they provide faster scrolling through the entire diary and facilitate easier reading and printing. The PDF files also render the diary texts in their original line lengths, with additions and deletions graphically displayed in green and red respectively, but with headers added to delineate the main divisions (accounts, genealogy, etc.) and chronological periods (year, month, day) of the diary.

Photographs and Art Work

The Lee Library, Digital Imaging Center, Brigham Young University, was responsible for digitizing 35 photographs and art work from Special Collections, along with six modern slides from Idaho State University. Photographs and art work was scanned on a variety of flatbed scanners using various scanning software packages: Agfa Arcus II and DuoScan scanners using FotoLook software, a Heidelberg Opal Ultra scanner using Lincolor Elite software, an Epson Expression 836XL scanner using SilverFast software, and a Eversmart Jazz+ scanner using Eversmart Scan software. The pixel-dimension chosen was 4000 pixels across the longest edge compared with 2500 pixels for the diaries and letters.

The University of Nevada, Reno used an Epson 636XL scanner for digitizing their contribution of thirteen photographs with Photoshop software at a scanning resolution of 400 dpi.

The Church Archives of the Church of Jesus Christ of Latter-day Saints utilized an Agfa DuoScan scanner with FotoLook software to scan twelve photographs at a resolution of 400 dpi.

The Utah State Historical Society used an Epson 1240 U-Photo scanner with Photoshop software at a resolution of 400 dpi in digitizing eight photographs from their collection.

The Churchill County Museum contribution of three photographs was done on an HP ScanJet 4C/T using PaintShop Pro software at a resolution of 300 dpi.

Each library sent a CD-ROM with TIFF masters to Brigham Young University, where reference files for web display were made from these TIFF masters.

Maps

The Digital Technologies Department, Marriott Library, University of Utah, was responsible for scanning the lion's share of the maps (33) using the Leica S1 Pro Digital Camera, with Hasselblad lenses and SilverFast software. The camera used a capture resolution of 5140 x 5140 pixels, resulting in uncompressed TIFF 24-bit images of a maximum of 75 Mb.

The ten oversized maps from the Brigham Young University collections were scanned with a Sinarcam II digital camera and Sinar digital back with Macroscan capability using a Sinaron digital 55mm f/4.5 lens, mounted on a Sinar P2 4 x 5 view camera, producing a 4000 x 6000 pixel capture image. Average files sizes of 24-bit uncompressed TIFF images were 45 Mb.

All of the maps for the project were saved as MrSid files (.sid) and were loaded into the MrSid server. The maps may be viewed in an HTML browser alone, or with a JAVA client or the MrSid downloadable plug-in.

Trail Guides

The trail guides were scanned at the Lee Library, Digital Imaging Center, Brigham Young University, using the same equipment and procedures as the diaries. At present the trail guides are only available as page images.
No re-keying or mark-up of the trail guides was possible to date, but we plan in a future release, to make the trail guides searchable.

Presentation

All of the collection content for Trails of Hope has been prepared for loading in CONTENTdm digital library collection software. The diaries are loaded as "compound" documents, making them available as sets of page images with transcribed text viewable side by side, along with the viewable metadata. For ease in reading, the diaries and letters are available as PDF text files. This should be helpful for readers with slow modems as well as individuals who are less concerned about seeing the original diary pages. The trail guides are available as page images, without transcription or mark-up, as are the diarist biographies.

A Photoshop batch process was run, for the diaries and letters, photographs and art work, and the trail guides, which added a black background with attribution information at the foot of each image.

An HTML interface has been developed as a front end. This includes all the publication supporting or explanatory material which is not loaded onto CONTENTdm. These consist of essays on specific trails and geographic material; acknowledgements; a "Suggested Readings;" essays "About the Collection, "Building the Digital Collection," and "Creating the Metadata." These framing pieces were based on the excellent guidelines developed at the Library of Congress for the presentation of American Memory digital publications.

Prepared by
Robert Espinosa,
Project Manager
Digital Imaging Centers,
Lee Library,
Brigham Young University