Pleiades 1 (February 2006 – June 2008)

This section is adapted from the Pleiades 1 final report submitted to the National Endowment for the Humanities in partial fulfillment of the terms of the Preservation and Access Research and Development grant that funded the prototyping of Pleiades during the period February 2006 - June 2008. It is presented here to provide a public accounting of our work. It is hoped that this history will be of use in the planning of other projects, and will assist both proposers and funders alike in planning for and managing change.

The successful Pleiades 1 proposal promised "the development, testing and evaluation of a spatially-enabled, multilingual community support system enabling interested parties worldwide to participate in the maintenance, diversification and beneficial reuse" of the CAP project data, an unspecified subset of which was to be digitized as part of the initial effort. The work plan called for a two-year period of iterative design, development and testing (February 2006 – January 2008) that would climax in the final 6 months with the recruitment of new participants and the opening of the collaborative content management system to initial public use. A no-cost extension stretched the period of performance through June 2008.

Significant research, planning and preparation of data preceded the funded project. Time estimates and initial designs for major aspects of the work were based on the best forecasts of project staff and our expert Steering Committee. For the project to be completed on schedule and in the manner originally proposed, the following steps had to be completed without difficulty:

  • Continued availability of key personnel
  • Prompt identification and hiring of the software developer
  • Creation of web mapping and gazetteer services, export functions for archival purposes, and Federal Geospatial Data Committee (FGDC) metadata standards
  • Realization of the geographic data model in a spatially-enabled relational database, linking it with Plone and with appropriate visualization and analysis component plugins in Plone
  • Porting of the AWMC bibliographic database from Microsoft Access to Plone
  • Effective implementation of the candidate editorial workflow (based on the procedures of the [wikiCAP Classical Atlas Project]) in Plone

Shortly after the grant award was announced, it emerged that, for domestic reasons, Tom Elliott (then AWMC Director) would be moving away from the Chapel Hill area permanently, and therefore could no longer continue as Director. The proposal had envisioned Elliott devoting 66% of his time to the Pleiades project in order to provide both project management and software design/development leadership. Given the key role Elliott had played in envisioning and preparing for Pleiades, the Principal Investigator (Richard Talbert) undertook to reorganize the project with Elliott as full-time Project Manager, a position he would exercise remotely as a "teleworking" employee of UNC-CH.

Informal identification of potential candidates for the software developer position began early; however, Sean Gillies came to Elliott's attention through a Steering Committee member only days before the award was publicly announced. By virtue of his unique blend of relevant skills and experience, Gillies soon emerged as the leading applicant for the position once it was authorized and could be formally advertised, but his existing consulting contracts precluded a hiring date prior to June 2006. Talbert and Elliott made the decision to wait until Gillies was available to fill the position, aware that an extension of the grant period would be required in consequence, as well as additional support to enable Elliott to continue working on the project through June 2008. Time has demonstrated the wisdom of this decision, as swift-moving, unpredictable changes in geospatial and information technology quickly undermined many of the specific implementation decisions that had been proposed. Gillies' imagination, technical acumen and connections to many aspects of the geospatial industry proved invaluable in successfully navigating this rapidly changing landscape.

Within months after work on the project began (and just as Gillies joined the team), Google Earth was released. Although many researchers familiar with the development and use of Geographic Information Systems had been aware of Keyhole Software and its virtual globe application, Google’s purchase of the company and free distribution of an upgraded version of its software, backed by streaming distribution of high resolution earth imagery, served to revolutionize expectations about the display and manipulation of geographic information on the web. The advent of Google Earth is just one example of a major paradigm shift – not only in online mapping, but also in the nature, origins and uses of digital spatial information – that is still unfolding. A range of free mapping and spatial data services are now available, and these are increasingly easy to use in combination with information drawn from other sources on the web.

These developments led us to change the project’s emphasis. Instead of implementing the suite of mainstream, enterprise-level GIS service protocols outlined in the original proposal, we elected to support the lightweight, easy-to-use formats associated with Google Earth and other emerging “neo-geographical” applications. This decision reflected our assessment that these applications would quickly achieve widespread use, which would then drive standardization and regularization of the associated file formats. Our estimation has been validated, not only by the near-ubiquity of Google Earth and online mapping tools, but by the recent adoption of the Keyhole Markup Language (KML; the encoding format used by Google Earth) as a recognized standard by the Open GIS Consortium (OGC), the same body that had elaborated the formats and protocols identified in our original proposal. To date, the vast majority of prospective users of Pleiades have been interested in the neo-geographical approach.

At the beginning of the project, Elliott had made the identification of a “static file format” for Pleiades data a significant priority. The goal was to adopt a standard encoding scheme or format that could be used to encode every aspect of the Pleiades content set completely and unambiguously. The resulting static files would furnish users with an easy way to move large subsets (or even the entire published collection) of Pleiades content into a desktop GIS or other system. Moreover, such a format would be readily amenable to time-stamped release, and to deposit into multiple digital archives as a preservation method. Preference was for an open, widely used standard. Consultation with spatial and archaeological data archivists during the grant period and subsequently has so far failed to identify a fully satisfactory solution for a static export format, although a slowly growing number of archives are taking various GIS formats, at least under bit-preservation terms. The original proposal anticipated that the Alexandria Digital Library (ADL) Gazetteer format would provide such a vehicle, and it went so far as to propose that Pleiades would support the full ADL Gazetteer Protocol for the exchange of such data. Unfortunately, development of this putative standard stalled with the retirement of its primary research leader (a Pleiades Steering Committee member), and it has not seen significant adoption outside its original application in the Alexandria Digital Library. Its complex and deeply hierarchical XML file structure lost ground in competition with the Open GIS (OGC) Consortium’s detailed http://en.wikipedia.org/wiki/Geography_Markup_Language Geography Markup Language] (in use by government and commercial GIS entities) and the lightweight KML format used by Google Earth. Moreover, many potential Pleiades users requested the well-known Shapefile format, which we would need to package with multiple data files containing tabular attribute data in comma-separated value (CSV) format. Ultimately, we elected to defer a final decision, and implementation of a solution, to a later phase of the project in hopes that a more appropriate standard approach would emerge.

After careful consideration, we departed from the original proposal in another technical area. Rather than implementing a hybrid system with content in Plone and geo-location data in another relational database, we chose to geo-enable the Zope Object Database (ZODB), which Plone incorporates. This decision made it much easier to test the geospatial aspects of our system: relatively low level development is easier to test than is integration of two different databases. This approach also brought us into alignment with a small but dynamic group of Plone and Python developers who were moving in a similar direction. Their adoption of our zgeo.spatialindex package has greatly helped us through contributions of code and robust testing.

We elected to defer full implementation of FGDC metadata to a later phase of the project, subject to recommendations by expert members of the community. The FGDC guidelines are more concerned with integrated datasets than granular data. Pleiades allows users to create their own datasets by freely aggregating the vetted locations, names, and places. It is not clear how to apply the FGDC metadata model comprehensively to this situation. Insofar as FGDC standards of description apply to specific aspects of our content (for example, horizontal accuracy and precision of location coordinates), we have incorporated them into our data model. We are also exploring visualization conventions that take metadata values into account, surfacing them to the user in intuitive ways (again, in light of published FGDC guidance). In particular, we are considering an upgrade of the KML we produce to include not only a coordinate pair for a point location, but also a circle whose radius corresponds to the nominal horizontal error in those coordinates, based on the precision and accuracy of the underlying data.

The migration plan for the bibliographic database was re-imagined after a series of attempts to model it effectively in Plone. One NEH evaluator of our proposal had commented on the complexity of the relational structure for the database supplied in the proposal (a database that was already in use at AWMC, but required porting to a more robust database system), and suggested that the eXtensible Markup Language might provide a more natural way to store bibliographic information. The object-oriented approach in Plone proved equally complicated, and we eventually elected a different approach entirely: this combines a simple model for bibliographic citations in Plone and a collection of web-facing bibliographic records authored in XML (using the Metadata Object Description Schema, or MODS, standard promulgated by the Library of Congress).

Our timetable for implementation of workflow in Plone incurred delays when we decided to wait for a major release of Plone that was slated to offer easier site customization, workflow definition and content versioning. This latter feature was particularly important because we have always viewed the ability to produce a change history for individual content items as an essential prerequisite for scholarly citation and verification. In the proposal, we had identified a Plone plugin that was then under active development; we had intended to use this to support versioning. Subsequent to the start of the project, the Plone development community elected to cease work on this plugin, and instead it promoted versioning as a major feature in the 3.0 release. As we waited for this release, which finally appeared in late 2007, we put extra labor into low-level modifications to the Plone indexing and content management functions that will enable enhanced spatial search and spatial feature correlation in a second phase of Pleiades. We also followed the guidance of the Plone community in preparing our existing software for the upgrade to Plone 3. This work paid off: the transition from Plone 2.5 (our initial starting point) to Plone 3 was not difficult.

We committed significantly more time to refining and experimenting with our proposed editorial workflow than expected. The project proposal described a complex process involving multiple content states, hierarchical roles and regimented review steps. It reflected an idealized version of the mature Classical Atlas Project editorial workflow, adjusted for a then-hypothetical digital environment. However, as work on Pleiades advanced, it emerged that this candidate workflow could be fully implemented in Plone, but that it was surely too rigid and too complicated to serve the full range of needs of our contributors and editors. It would obstruct, rather than encourage, rapid piecewise improvement of content, and overly burden editors with a series of mandatory, computationally mediated checklists. Through a series of prototyping exercises conducted in consultation with key Steering Committee members and Technical Observers, we devised a simpler editorial workflow that preserves the essential safeguards inherent in the Classical Atlas Project working methods, while providing the Pleiades Community with a fluid process for quickly making and vetting suggestions and additions. The Pleiades workflow of 2008 is radically simplified, consisting of the fewest number of content states, transitions, and user roles that can possibly work. Development cost to implement is reduced. Community and collaborative interactions are emphasized.