Pleiades Content Methods

What we need to do to get content (especially legacy content) into Pleiades, and the tools we use to make that happen.

CAP Map-by-Map Directory Listings

Let's call the XHTML data "hDir", the XML data "xDir".

  • Convert from MSWord to well-formed XHTML, minus MSWord tags, and put somewhere safe
  • Convert from MSWord to some kind of relatively clean, semantic XML and put somewhere safe
    • first stab 18 November 2008; on atlantides.org under /var/pleiades/batlas/dirs/xml/ (accomplished with code in dir2xml.py in BADataMunger; see README.txt)
    • basically munged together batlasid xml with old-style frankenformat xml

CAP Bibliography Listings

As above, "hBib", "xBib".

  • Extract from directory XHTML (see above) and convert to XHTML term lists instead of just paragraphs
  • Convert from XHTML term lists to MODS by parsing and matching, where possible, with the AWMC bibliographic database content, already in MODS
  • Submit book records to Open Library
  • Keep existing bibliographic service
    • prep and upload tooling needs to be streamlined and made easier to use (BibIt, plus I think some BADataMunger dependencies)
    • make Zotero interop better (lossless-ier) than current COinS support
  • Current bibliographic data is in the pleiades-bibliography module

Matching CAP-list-extracted data with map-digitized data

  • extract/retool the existing logic that's used to match digitized map data with the directory data so it can be cleanly packaged for use in a tool

Fuzziness

Resolving ambiguity between map sheet features and map directory records is the community's work. Development should focus on indicating ambiguity and showing possibilities.

Example:

Map grid A1 has two unlabeled bridge features, the directory has 2 records for "unnamed bridges" in grid A1.