Discussion of mapping, aggregation and crosswalking at DukeWorkshop
This information is out of date as of 21 June 2009. See now MapMaker and Aggregator.
This is separate from, but relevant to, a discussion of the PapyrologicalNumbersServer.
Definitions
- What is "mapping"? The "mapper"? The process, originally coded by ZA and employing XSL transforms, that determined relationships between DDB texts, HGV metadata, HGV translations, TM numbers and "Perseus numbers". Produced a mapping.xml file, which was used by the "aggregator" step in "runner" to produce aggregated EpiDoc XML.
- What is "aggregation"? The "aggregator"? An XSLT process that starts with the DDB texts; for each, it uses the mapping.xml file to see whether an HGV text exists or not and aggregates accordingly; if not, it creates a DDB-only output. Then as a next step, it goes through the HGV and finds anythng without a DDB number, and creates an HGV-only output for that. Fired off by runner, but is a "pure" XSLT process.
- What is "crosswalker"? Does not exist. We have a crosswalk: the part of runner that converts from HGV XML dump to EpiDoc. Crosswalker was supposed to be the abstract tool that would, from a relatively simple input, generate one or more crosswalks. A deliverable under IDP1, deferred, and now a deliverable under IDP2. May not be wanted. Need to arrive at consensus. Requirement to convert (i.e., to crosswalk) repeatedly from HGV dump to EpiDoc XML remains central to our needs.
Mapping and aggregation
- What information in what datasets is used to effect the mapping process (all of them should have been enumerated in DukeWorkshopIdentDisc)? Where are these files stored?
- What code is used to effect the mapping process? What are the data outputs? Where are they stored?
- Do the mapping outputs match the aggregator inputs?
- Does "crosswalker" have anything to do with this stuff?
