|Creating the Resource Center|
Work on the JAMES BUCHANAN RESOURCE CENTER website was divided into several stages including the selection and preparation of materials, selection of hardware and software, pre-processing of materials and testing, final processing, website development, and content development. Information on each of these project stages is provided below.
Selection and Preparation of Materials
Several biographies of James Buchanan, along with his published collected works, have been out-of-print for a significant period of time and can therefore be difficult for interested researchers to acquire. Most of these out-of-print works have also passed into the public domain. Materials such as these, of limited availability and without copyright restrictions, were chosen for inclusion in the JAMES BUCHANAN RESOURCE CENTER. In order to facilitate the scanning process, the books ultimately selected for the project were sent to Wert Bookbinding (Grantville, PA) to be disbound.
Selection of Hardware and Software
Earlier digitization efforts in the Dickinson College Archives and Special Collections predetermined the hardware and software to be used. The equipment included two workstations (Micron computers and Sony monitors) and one flatbed scanner. The Hewlett-Packard Scanjet 7400c was used for materials scanned in full-color as well as materials scanned in grayscale at lower bit-depths.
The software utilized for this project included Dreamweaver MX, CONTENTdm
3.6 (upgraded to 3.7 following its July 2004 release), Adobe Photoshop
7.0, and ScanSoft OmniPage Pro 12.0. Dreamweaver MX was used to develop
the static web pages that act as a portal to the materials in the database.
The CONTENTdm software had been used to great success in a previous
project, titled Their Own Words, and therefore the license to the software
was upgraded to allow for the JAMES BUCHANAN RESOURCE CENTER project.
Photoshop was used for image editing and web development needs, and
was also used to create the display images stored in the CONTENTdm
database. OmniPage Pro was used for optical character recognition (OCR)
of the scanned printed texts. The inevitable errors in OCR precision,
particularly when dealing with faded texts or difficult fonts, were
overcome by proofreading and then hand-correcting the output, allowing
a greater degree of accuracy in the final digital product.
Pre-processing involved the various tests and procedures that needed to be decided upon prior to beginning the processing in earnest. To that end, we determined the optimal image resolutions, bit-depths, and file types necessary for our archival and presentation images. Time was spent exploring what metadata we wanted to collect with regard to the original items and subsequent digital copies. (CONTENTdm provides the Dublin Core as its default metadata structure, but additional fields may be added.) For the transcription of handwritten and printed texts, we adhered to internal standards that were adopted for earlier digital projects, to ensure consistency of presentation and interpretation. We also followed internal standards with regard to a file naming schema. For more detailed information on the above, we have included the following documents developed by the project staff, presented in PDF format:
The processing of materials did not always follow a linear path due to the division of labor and different preparations needed for different materials. The basic steps that books went through were as follows: disbinding, scanning, OCRing, creating a back-up copy of the archival TIFF image and RTF transcription, creating a presentation JPEG image and TXT transcription, uploading the information into the CONTENTdm database, inputting the metadata, and then digitally "recreating" the book within CONTENTdm. In the case of the collection of original correspondence, the only differences in the processing of materials included the need to transcribe the letters by hand (rather than OCRing), and then to double-check the transcriptions for accuracy. All other steps for the processing of the correspondence, including the scanning and preparation of TIFF and JPEG images and the upload of information into the CONTENTdm database, followed the same manner as used for printed works.
Website development was carried out at various points of the project, and maintenance of the website will continue following the end of active project development. The design decided upon is the product of collaboration among all of the project staff and is comprised of two levels. The first level consists of static HTML pages that provide basic site information and navigation. For the first level, we have attempted to provide a simple and straightforward design for ease of use by researchers of all ages and skills. The second level consists of the CONTENTdm functions used to facilitate searching and recreating the books and letters in their digital forms. Through the modification of templates that accompany the software, the display of the materials within CONTENTdm’s viewing tools has been made as consistent to the design of the static webpages as possible.
Content development refers to the processes of research and writing to provide more information for users about the subject of this project, James Buchanan. This content includes various biographical sketches of Buchanan from encyclopedic resources. Also included are reviews, contemporary with the original date of publication, of each book digitized for this project. The project coordinator performed extensive research to compile bibliographies of various types of resources. Research was also done by the project coordinator to develop a comprehensive timeline of the life of James Buchanan, with notable national events that occurred during his lifetime displayed alongside in order to provide greater context.