Print Version
Reason for narrative
It cannot be denied that the audience for this narrative is limited. This narrative is provided to:
  • Share with entities contemplating a similar project issues which the Legislative Reference Bureau encountered in the course of prosecuting the project.
  • Allow individual project members to consider the project as a whole.
The Legislative Reference Bureau, an agency of the Pennsylvania General Assembly, has undertaken a project to digitize the annual session laws published from time to time in the Province and Commonwealth of Pennsylvania from 1682 to the most recently completed session of the Legislature. While the project has for the most part made use of in-house resources, it has been necessary to outsource the scanning of the session laws. Project members perform project tasks as an adjunct to their primary duties. Predictably, the project relies heavily on database objects such as tables, queries, forms and reports. While keeping documentation accurate and up-to-date has been a challenge, its value to newly recruited and casual project members is unquestionable. The TIFF files created by the scanning process are used to create PDF files, which are used for both Web site and preservation purposes. Lessons learned from the project are freely shared.
Project purposes
Public access to the session laws, especially laws from the the seventeenth through the nineteenth centuries, has been difficult to obtain. Volumes of the early session laws are rare and typically held in institutional collections to which public access is restricted. The session laws, especially the volumes of the seventeenth and eighteenth centuries, contain fascinating legal insights. Making the session laws available on the World Wide Web allows the public, not just a few scholars and researchers, to read these remarkable documents. While the initial goal of the project was to provide public access to the session laws, the horrific events of September 11, 2001, have placed accelerated emphasis on disaster recovery planning. The project's initial purpose - of affording public access - has dovetailed nicely with a corollary purpose, preserving session laws in digital and micrographic format for future generations.
Driving force
The driving force for the project has been Carl L. Mease, former Director of the Legislative Reference Bureau. Single-handedly, he began to sketch the project in the spring of 2001.
Project management plan
In the summer of 2001 a project management plan was adopted and documentation was written stating the project's purposes and initially assigning duties to project members. The plan provided for use of in-house hardware and software and in-house human resources to perform project tasks. The critical piece of in-house hardware was an enterprise level copier augmented with a networked scanning package. The project management plan anticipated that work on the project would be collateral to a project member's primary duties. Since legislative work remains to this day somewhat seasonal in nature, this method of human resource allocation appeared feasible.
Documentation, written procedures for performing project tasks, was created with a broad brush early on. Project members were encouraged to share their experiences with the documentation writer, who modified the documentation frequently to reflect actual experiences and suggestions of project members. Involving project members in the documentation writing process enhanced their stake in the project and resulted in accurate, relevant documentation.
Implementing project management plan.
To afford project members an opportunity to become familiar with their tasks, a decision was made to enter a test mode using recent session laws volumes which were readily available, in good condition and for the most part free of dust and other debris, making them ideal subjects for scanning. The first step in the overall process was to enter data for each volume into an Access® table using a form. Each page, including blank divider pages, of each volume was accounted for. The data in this table would later be used to create the PDF files corresponding to acts, vetoes and other legislative actions. After data entry was complete, the volumes were scanned to produce TIFF (Tagged Image File Format) files. Scanning using the copier outfitted with specialized hardware and software was not without incident, as several service visits were required to reprogram the machine for full-time use as a scanner. Nevertheless, some TIFF files were created and placed in directories named for the volume year. Test PDF files were created.
Serious scanning problems develop
After the volumes for several session years had been scanned, it was time to experiment with older volumes. Page margins on the older volumes and page sizes that varied from year to year resulted in anomalies on the TIFFs, primarily in the form of dark horizontal lines. Service representatives for the copier manufacture reprogrammed the scanning software in an attempt to address these anomalies. Valiant as their efforts were, it was their final advice that the machine was being used for out-of-specification work. An additional problem with the copier being placed into service as a scanning device was the transmission of debris from the older volumes to the copier's rollers. This transmission was so intense with the older volumes that service would be required after a run of only 300-500 pages. Scanning work was temporarily halted while various options with the existing device were explored.
Outsourcing investigated
A delegation of project members made arrangements to visit the facilities of the Division of Records Administration and Image Services, Bureau of Archives and History of the Pennsylvania Historical and Museum Commission, an agency of the Executive Branch. The Division performs various document management services, including scanning of documents, on a chargeback basis to State agencies. Following discussions with key Division personnel and an examination of the Division's hardware assets, the delegation unanimously recommended that the scanning work be outsourced to the Division. The recommendation was approved. A letter of understanding containing customary provisions and a few project-specific provisions was executed.
Project management plan revised
Since the original project management plan provided for in-house scanning, it was necessary to revise the plan to delete in-house scanning and related tasks. New tasks were added to address preparing volumes for shipment to the Division and examining the results of the scanning process. A new form and table had to be developed to generate a directory structure report for each volume, prescribing for the Division a directory structure for the TIFF files for each volume. A hard copy of this report accompanies each volume that is shipped to the Division.
Treasure hunt
Volumes for the first collection to be scanned, the Statutes at Large, had to be located. While the Bureau houses several sets of the Statutes at Large, they are working volumes. For many years, it was customary to open the volumes at the end of a session to record amendments, repeals and other actions affecting individual laws on the page margins. A hunt began for unmarked volumes in acceptable condition to undergo scanning. The project's chief investigator prowled the subbasement of the State Library, the stacks of the State Law Library, the stacks of the Senate Library and other musty locations in search of these rare volumes. Some of the volumes were donated outright, while others were loaned to the Bureau. This dogged determination bore fruit as the Bureau was able to ship to the Division all 17 printed volumes of the Statutes at Large.
Outsourcing scanning
Volumes destined to be scanned by the Division were carefully packaged for shipment. A directory structure report prescribing the grouping of TIFF images accompanied each volume. Scanning was performed using either a production-style rotary duplex scanner or a large flatbed scanner. In the case of borrowed volumes where the volumes were borrowed upon condition that they be returned in the same condition as when borrowed, the flatbed scanner was used. In the case of disposable volumes with durable pages, the production-style rotary scanner was used. In all cases, the pages were scanned at a resolution of 300 dpi (dots per inch), an ample resolution for conversion to text, Web display and preservation purposes. The Division saved the TIFF images for each volume to a CD-R. Upon Bureau receipt of a copy CD-R for a volume, the contents were copied to prescribed directories on a Bureau server. Project members then viewed each TIFF file to ensure that all files were of acceptable quality. While a few scanning glitches occurred, by and large the Division produced TIFFS of exceptional quality. Where necessary, the Division improved its original files with image enhancement software. Upon receipt of notice that all images for a volume were accepted by the Bureau, the Division sent the Bureau the master CD-R for the volume.
Ongoing data entry
While several Access® tables and forms were relied upon, the cornerstone table contained a number of fields. Some of the fields contained inventory values necessary for preservation and housekeeping purposes, such as the location and names of TIFF files. Others contained values, including, year of enactment, document type (act or index, to name just two), location and name of PDF files, document subject and document keyword and PDF file size, necessary to dynamically generate pages to display PDF files on the Web. Data entry on an Access® form bound to the cornerstone table was performed on an ongoing basis by several project members.
Creation of PDF files
Using a report of the data in the cornerstone table for each volume, project members identified TIFF files to associate with a particular document in a volume. For example, to create a PDF file for an act in Volume 14 of the Statutes at Large named "Confer on certain associations of the citizens of this commonwealth the powers and immunities of corporations, or bodies politic in law," it was necessary to navigate to the folder containing TIFF files for 1791. Further navigation was necessary to find the folder containing the five TIFF files associated with this act, Act 50. After the correct constituent TIFF files were selected, a PDF file with an Optical Character Recognition (OCR) attribute was created using ScanSoft's PaperPort® 6.5 Deluxe Edition and TextBridge® Pro 9.0 Business Edition. The OCR attribute allows for searching within a document using the "Find" feature of reader software. Upon completion of each batch of PDF files, quality and accuracy control protocols were applied to the batch. For preservation purposes, a master CD-R and copy CD-R of the PDF files for each session year are created. After these discs are created, all TIFF files for the session year are deleted from the server.
Storage of discs
Upon completion of work on the Statutes at Large, the copy TIFF CD-R and the copy PDF CD-R are placed in secure, temperature and humidity controlled off-site storage. The masters will be held on site pending project completion, when data on all master discs will migrate to 16mm microfilm for placement in secure offsite storage.
Research and writing
Well-versed in modern legislative procedures and conventions, researchers traveling back in time fell into a labyrinth of arcane legislative procedures, hidden certifications, dead-end index entries and spurious spines on rebound volumes. Web-style writing principles were followed to distill the results of many hours of statutory history research into succinct discussions of relevant topics.
Planning the Web site
It was recognized early on that the Web site had to be designed and developed to address the needs and interests of a broad group of users. It had to be basic and straightforward to speak to casual users. It had to be comprehensive and accurate to address the needs of legal scholars and researchers. It had to include pages, carefully written using Web-style writing guidelines, to explain the project and to set forth statutory authority to justify selection of the Statutes at Large and Smith's Laws collections. Finally, as a product of the drafting agency for the Pennsylvania General Assembly, it had to carry to the electronic medium the same level of quality control enforced in the generation of legislative documents.
Design issues
It was necessary for the site to make use of proprietary technology, Adobe® System's Acrobat Reader®. Ideally, a site should not require a user to acquire a supplement to the browser to view content. The nature of the content on this site, however, precluded obedience to this fundamental principle of usability. Aside from the unavoidable requirement of the Adobe plug-in, it was felt that the site should shun technologies that would limit its use. Since a Java applet requires acquisition in some instances and enabling of the Java Virtual Machine, no Java applets were used. The site was tested using various versions of Internet Explorer® and Netscape® products. The issue of screen resolution and monitor sizes was difficult to address satisfactorily. Since most users have their monitors set at a resolution of 800x600 pixels, this resolution was used as a benchmark with the understanding that the pages will render less than ideally on a 21-inch monitor set at a resolution of 1024x768 pixels.
Development issues
While Access® has proved to be indispensable to the project, when its use as a relational database management system (RDBMS) for Web site deployment was investigated, it was found that its limitation on the number of concurrent users might adversely affect access to dynamically generated data seeking files. To address the concurrency issue, a license was purchased for SQL Server 2000®. Data in Access® tables is imported into SQL Server 2000® on a scheduled basis. Calls to the database server are made using Active Server Pages, Microsoft's server side scripting engine. Both development and design were undertaken using Allaire's HomeSite® 4.5.2. Creation of graphics for the site was outsourced to a professional graphic artist.
Server hardware and software
The Web server and dedicated database server are IBM® xSeries® machines. The Web server runs IIS 5 on top of Windows® 2000 Server®. SQL Server 2000® is installed on the database server.
Usability testing
With the understanding that site development would be an ongoing process, a usability testing group consisting of five diverse users worked with a beta version of the site in a computer laboratory. The testing methodology was semistructured, allowing for both spontaneous user observations and for tasks to gauge the effectiveness of site navigation, searching and other elements of functionality. Since text pages were considered to be a critical component of the site, members of the testing group were asked to read these pages verbatim. User observations and their experiences with tasks were recorded in real time. The usability testing record was presented to site developers who acted on the record to produce a revised beta version. The testing and redevelopment process was repeated until a production version of the site was achieved.
Incremental availability of session laws
A determination was made to deploy the site upon completion of work on the Statutes at Large. This collection would offer sufficient value to users and simultaneously provide site use data to aid in rolling out the other two collections, Smith's Laws and Pamphlet Laws.
Lessons learned
While the project manager will continually monitor server data and address user suggestions and comments, as well as respond to internally generated ideas for improving all facets of the project, the lessons learned thus far may be of some interest to other entities contemplating similar projects. The lessons learned:
  • A project management plan and documentation must be developed and updated as needed. Documentation, here defined as detailed procedures for performing a specific task, must be written in plain language, using a step-by-step style so that a newly recruited project member can immediately begin to perform a task. The project management plan and documentation for all tasks should be combined into a project booklet and distributed to all project members.
  • Scanning should be outsourced. When it came time to scan older, fragile volumes, it became apparent that a specialized device was needed to capture the pages without scanning glitches. Since specialized scanners are quite costly and require operator training, it did not seem feasible to acquire a specialized scanner for the project.
  • At the heart of the project lies a database. It is imperative to carefully consider database design and the manner of data entry. While many database solutions exist, an Access® database with table, form, query and report objects served the project well. Fortunately, a project member was able to bring years of Access® programming experience to the project.
  • Regardless of the number of project members, there will be a set of members who initially are or become key members. Whatever their status, the members of this group should represent diverse backgrounds and skill sets.
  • E-mail is not an adequate medium for communication among project members. Instant messaging allows for real-time problem raising and resolution and enables absent project members to review archived project issues upon return to the project.
  • The project manager should schedule regular meetings with all project members rather than meeting with individual project members to discuss issues peculiar to them. Regular meetings enhance the sense of community and allow for in-person delivery of project and individual progress reports.
  • The project manager must recognize exemplary individual effort and publicize the recognition to all project members.
  • While far from perfect, the project model has proved to be viable, enabling the Bureau to prosecute the project economically and with few external dependencies.
^ top
No commercial endorsement
In this narrative, references are made to specific entities and particular products. These references have been included in the interest of setting forth a complete project narrative. There is no intention to endorse a specific entity or a specific piece of software or hardware.