link to help page link to search page link to browse page link to about page

 

Introduction

Work on Their Own Words was divided into identifiable, and at times, overlapping stages, including the selection of materials, the selection of hardware and software, pre-processing of materials and testing, final processing, website development, and content development. For further information on any one of these stages, please select the appropriate section on this page.

Introduction | Selection of Materials | Hardware/Software Selection | Pre-processing |
Final processing
| Web Development | Content Development

Selection of Materials

Careful consideration went into the selection of materials for inclusion in Their Own Words, and criteria for selection were particularly focused on a) materials difficult to access due to their rarity, b) materials from the eighteenth and nineteenth centuries, c) materials with broad interdisciplinary application, and d) materials that are "at risk," from a preservation point of view.

Following the selection and evaluation process, the materials were examined for their ease of scanning. The majority of books were either too fragile or too rigid to be scanned in their original bindings and therefore were sent either to Water Street Bindery (Lancaster, PA) to be disbound by hand or to Wert Bookbinding (Grantville, PA) to be disbound by machine. The criteria used for deciding which books to send to which bindery were age, quality, and originality of binding. In just a few cases the decision was made not to disbind the book, but rather to leave the text and binding intact.

Hardware and Software

The hardware purchased for Their Own Words included two workstations (Micron computers and Sony monitors), one external hard drive, and two flatbed scanners. Careful consideration went into the selection of hardware for reliability and durability. One scanner, a Hewlett-Packard Scanjet 7400c, was chosen for its large available scanning area, high bit-depth for color scanning, and relative economical price for the features included. A Canon CanoScan LIDE 30 was used for scanning at lower bit-depths. The external hard drive, purchased from Western Digital, was used to store back-up copies of archival TIFF images and page transcriptions during the processing stages.

Software purchased included CONTENTdm 3.4 (upgraded to 3.5 following its July 2003 release), Adobe Photoshop 7.0, and ScanSoft OmniPage Pro 12.0. Dreamweaver MX, available through an institution-wide site license, was used to develop the static web pages which act as a portal to the materials in the database. Photoshop was purchased for image editing and web development needs, and was also used to create the presentation images stored in the CONTENTdm database. OmniPage Pro, for optical character recognition (OCR), was purchased for its ease-of-use and local availability. The inevitable errors in OCR precision, particularly when dealing with faded texts or difficult fonts, were overcome by proofreading and then hand-correcting the output, allowing a greater degree of accuracy.

The decision to purchase CONTENTdm was made following weeks of exploring other possible options for either available database software or database self-development. In the end, the time period needed to outsource the design of our own database exceeded the time and money available under the grant. Other database software packages were deemed unsatisfactory for our needs due to lack of features, difficulty of use and maintenance, lack of customizability, and concerns over price. Upon preliminary selection of CONTENTdm as our digital collection management software, we evaluated a trial-copy of the software to determine its appropriateness for our project. Though we identified several problems in terms of how we wished to display and search our materials, we determined that adjustments could be made to the metadata to aid searchability. We also felt confident that the software would continue to be developed through time so that future upgrades would address some of our concerns.

Pre-processing

Pre-processing involved the various tests and procedures that needed to be decided upon prior to beginning the processing in earnest. To that end, we determined the optimal image resolutions, bit-depths, and file types necessary for our archival and presentation images. Time was spent exploring what metadata we wanted to collect with regard to the original items and subsequent digital copies. (CONTENTdm provides the Dublin Core as its default metadata structure, but additional fields may be added.) Also, criteria for the transcription of handwritten and printed texts were explored and adopted to ensure consistency of presentation and interpretation. The determination was also made at this time whether the physical condition or unique typefonts of any printed texts warranted hand-transcription instead of OCR processing. Finally, we established a system for file naming. For more detailed information on the above, we have included the following documents developed by the project staff, presented in PDF format:

Final Processing

The processing of materials did not always follow a linear path due to the division of labor and different preparations needed for different materials. The basic steps that most materials went through were as follows: disbinding, scanning, transcribing or OCRing, creating a back-up copy of the archival TIFF image and RTF transcription, creating a presentation JPEG image and TXT transcription, uploading the information into the CONTENTdm database, inputting the metadata, and then digitally "recreating" the book or collection within CONTENTdm. Not all items followed this path directly, and, as mentioned above, some items needed fewer or additional steps.

Website Development

Website development was carried out at various points of the project, and maintenance of the website will continue following the end of active project development. The design decided upon is the product of collaboration among all of the project staff and is comprised of two levels. The first level consists of static HTML pages that provide basic site information and navigation as well as an individual "main" page for each item in the collection. The second level consists of the CONTENTdm functions used to facilitate searching and recreating the books and other items in their digital forms. Through the modification of templates that accompany the software, the display of the materials within CONTENTdm’s viewing tools has been made as consistent to the design of the static webpages as possible.

Content Development

Content development refers to the processes of research and writing to provide more information for users about the authors, their works, and the context in which these works were written. Most all of the original content on the Their Own Words website was produced by the project coordinator and edited by the project director. This content includes biographical sketches of each author and contextual sketches of each book, pamphlet, letter, and diary, as well as the brief descriptions which appear on each item’s individual "main" page.

Research was also done by student assistants to locate book reviews contemporary with the original publication date of each book. When such reviews have been identified, they have been transcribed and made available through a link from the book’s "main" page.

Page created: October 21, 2003



Home | About | Browse | Search | Help | Site Map
Dickinson College | DEILA | Archives | Contact Us