Introduction | Selection
of Materials | Hardware/Software Selection | Pre-processing |
Final processing | Web Development | Content
Development
Introduction
Work on Their Own
Words was divided
into identifiable, and at times, overlapping stages, including
the selection of materials, the selection of hardware and software,
pre-processing of materials and testing, final processing, website
development, and content development. For further information
on any one of these stages, please select the appropriate section
on this page.
Introduction |
Selection of Materials | Hardware/Software Selection | Pre-processing |
Final processing | Web Development | Content
Development
Selection of Materials
Careful consideration went into the selection
of materials for inclusion in Their Own Words, and criteria
for selection were
particularly focused on a) materials difficult to access due
to their rarity, b) materials from the eighteenth and nineteenth
centuries, c) materials with broad interdisciplinary application,
and d) materials that are "at risk," from a preservation
point of view.
Following the selection and evaluation process, the materials
were examined for their ease of scanning. The majority of books
were either too fragile or too rigid to be scanned in their original
bindings and therefore were sent either to Water Street Bindery
(Lancaster, PA) to be disbound by hand or to Wert Bookbinding
(Grantville, PA) to be disbound by machine. The criteria used
for deciding which books to send to which bindery were age, quality,
and originality of binding. In just a few cases the decision
was made not to disbind the book, but rather to leave the text
and binding intact.
Introduction | Selection
of Materials | Hardware/Software Selection | Pre-processing |
Final processing | Web Development | Content
Development
Hardware and Software
The hardware purchased for Their
Own Words included two workstations
(Micron computers and Sony monitors), one external hard drive,
and two flatbed scanners. Careful consideration went into the
selection of hardware for reliability and durability. One scanner,
a Hewlett-Packard Scanjet 7400c, was chosen for its large available
scanning area, high bit-depth for color scanning, and relative
economical price for the features included. A Canon CanoScan
LIDE 30 was used for scanning at lower bit-depths. The external
hard drive, purchased from Western Digital, was used to store
back-up copies of archival TIFF images and page transcriptions
during the processing stages.
Software purchased included CONTENTdm 3.4 (upgraded to 3.5 following
its July 2003 release), Adobe Photoshop 7.0, and ScanSoft OmniPage
Pro 12.0. Dreamweaver MX, available through an institution-wide
site license, was used to develop the static web pages which
act as a portal to the materials in the database. Photoshop was
purchased for image editing and web development needs, and was
also used to create the presentation images stored in the CONTENTdm
database. OmniPage Pro, for optical character recognition (OCR),
was purchased for its ease-of-use and local availability. The
inevitable errors in OCR precision, particularly when dealing
with faded texts or difficult fonts, were overcome by proofreading
and then hand-correcting the output, allowing a greater degree
of accuracy.
The decision to purchase CONTENTdm was made following weeks
of exploring other possible options for either available database
software or database self-development. In the end, the time period
needed to outsource the design of our own database exceeded the
time and money available under the grant. Other database software
packages were deemed unsatisfactory for our needs due to lack
of features, difficulty of use and maintenance, lack of customizability,
and concerns over price. Upon preliminary selection of CONTENTdm
as our digital collection management software, we evaluated a
trial-copy of the software to determine its appropriateness for
our project. Though we identified several problems in terms of
how we wished to display and search our materials, we determined
that adjustments could be made to the metadata to aid searchability.
We also felt confident that the software would continue to be
developed through time so that future upgrades would address
some of our concerns.
Introduction | Selection
of Materials | Hardware/Software Selection |
Pre-processing |
Final processing | Web Development | Content
Development
Pre-processing
Pre-processing involved the various tests and procedures that
needed to be decided upon prior to beginning the processing in
earnest. To that end, we determined the optimal image resolutions,
bit-depths, and file types necessary for our archival and presentation
images. Time was spent exploring what metadata we wanted to collect
with regard to the original items and subsequent digital copies.
(CONTENTdm provides the Dublin Core as its default metadata structure,
but additional fields may be added.) Also, criteria for the transcription
of handwritten and printed texts were explored and adopted to
ensure consistency of presentation and interpretation. The determination
was also made at this time whether the physical condition or
unique typefonts of any printed texts warranted hand-transcription
instead of OCR processing. Finally, we established a system for
file naming. For more detailed information on the above, we have
included the following documents developed by the project staff,
presented in PDF format:
Introduction | Selection
of Materials | Hardware/Software Selection | Pre-processing |
Final processing | Web Development | Content Development
Final Processing
The processing of materials did not always follow a linear
path due to the division of labor and different preparations needed
for different materials. The basic steps that most materials
went through were as follows: disbinding, scanning, transcribing
or OCRing, creating a back-up copy of the archival TIFF image
and RTF transcription, creating a presentation JPEG image and
TXT transcription, uploading the information into the CONTENTdm
database, inputting the metadata, and then digitally "recreating" the
book or collection within CONTENTdm. Not all items followed
this path directly, and, as mentioned above, some items needed
fewer
or additional steps.
Introduction | Selection
of Materials | Hardware/Software Selection | Pre-processing |
Final processing | Web Development | Content Development
Website Development
Website development was carried out at various points
of the project, and maintenance of the website will continue following
the end of active project development. The design decided upon
is the product of collaboration among all of the project staff
and is comprised of two levels. The first level consists of
static HTML pages that provide basic site information and navigation
as well as an individual "main" page for each item
in the collection. The second level consists of the CONTENTdm
functions used to facilitate searching and recreating the books
and other items in their digital forms. Through the modification
of templates that accompany the software, the display of the
materials within CONTENTdm’s viewing tools has been made
as consistent to the design of the static webpages as possible.
Introduction | Selection
of Materials | Hardware/Software Selection | Pre-processing |
Final processing | Web Development | Content Development
Content Development
Content development refers to the processes of research
and writing to provide more information for users about the authors,
their works, and the context in which these works were written.
Most all of the original content on the Their
Own Words website
was produced by the project coordinator and edited by the project
director. This content includes biographical sketches of each
author and contextual sketches of each book, pamphlet, letter,
and diary, as well as the brief descriptions which appear on
each item’s individual "main" page.
Research was also done by student assistants to locate
book reviews contemporary with the original publication date of each book.
When such reviews have been identified, they have been transcribed
and made available through a link from the book’s "main" page.