Papyrological Navigator

one component of IDPSoftware

http://papyri.info

The PN is "a customized search engine ... capable of retrieving information from multiple related sites." It is a custom web application, prototyped at Columbia that is intended to replace the current production applications for APIS and DDBDP, and to provide access to the full content of HGV records. It is therefore envisioned as being capable of searching both metadata and texts, and displaying metadata records, texts and images pulled from all three source datasets. Links to other data sources, such as Trismegistos?, are planned for the near term. Content expansion, to incorporate other full datasets, is envisioned further down the road. The PN employs the portal metaphor.

Background on the PN, from Columbia.

Software and language components:

Lots of Java:

  • Apache Tomcat
  • Apache Jetspeed-2 portlet container
    • portlet spec
    • Velocity templates
    • Apache derby supports the portlet container, but is not really part of the app
    • Some javax.naming implementations to bridge the bits of a request served by the container to the portlet app
  • Lucene
    • For the PN, really just indexing code. The DDb has lots of fancy tokenizers, etc.
  • eRez/FSI images serving software, and the flash plugin FSI licenses to view images
    • a web-app that serves up XML config files for the Flash plugin based on the APIS id of the document in question

Modules/Portlets

Metadata Search Portlet

http://apptest.cul.columbia.edu:8082/navigator/portal/default-page.psml

This is the only component currently linked from the top level at papyri.info.

Image Portlet

The Image Portlet provides for the online viewing and limited manipulation of APIS-hosted images via the PN. This portlet is supported server-side by licensed proprietary software: eRez/FSI image server and Flash Plugin FSI. Prior development has taken advantage of institutional licenses at Columbia for these. Columbia's licenses will continue to cover production needs through July 2010. Development work at NYU may require access to separate licenses, and when NYU takes over production responsibilities in 2010, licenses will definitely be required.

Text Portlet

Experimental text search interface (not linked from main public page yet): http://appdev.cul.columbia.edu:8082/ddbdp-nav/search

Statistics regarding content indexed for same (also not public yet): http://appdev.cul.columbia.edu:8082/ddbdp-nav/stats

The Text Portlet (a.k.a. the DDbDP app) supports rendering and manipulation of DDbDP-originated papyrus texts in their original languages. No proprietary software is involved. The application takes advantage of SRU in its query interface. The back-end for search right now is actually based on receiving a valid CQL query ( http://www.loc.gov/standards/sru/specs/cql.html ) rather than the input from any particular HTML form per se. This means that the number of clauses is just a matter of the HTML having a way to accommodate the appropriate number of inputs.

Translation Portlet

??????? APIS translations. HGV translations.

Getting data into the PN

Workflow for refreshing data is still pretty much nonexistent, although automating it is an [IDP1] deliverable for June/July 2008. As of now, basically 3 huge XML files (apis data, hgv data, and a composite file, called 'aggregated' (http://epiduke.cch.kcl.ac.uk/aggregated/), that has a mapped subset of the previously mentioned) are indexed by Lucene. At some point in the near future will probably move to putting a harvestable interface to APIS data up instead of relying on the Big Xml File. In any case, that's the stuff that goes into the PN as of now.

Specific PN Software Development Tasks

  • Interface Assessment: "conduct usability testing in order to identify areas of possible improvement"
  • Enhancements to new (Greek) lemmatized and proximity searching
  • the refinement of combined metadata and text searching functionality
  • mechanisms for automatic ingestion of updated EpiDoc / UNICODE texts and metadata from the DDbDP and HGV
  • implementation of basic EpiDoc support for APIS translations
  • investigation and, if feasible, implementation of XML-based interoperability with non-APIS digital projects like the German Papyrus Portal and databases involved in Trismegistos

Documents:

  • In the APIS Phase 6 proposal (pdf), see:
    • Section 5.1: Interface Assessment, PDF document pages 21-22
    • Section 5.2.1: Papyrological Navigator (papyri.info), PDF document pages 22-23.
    • Appendix D: APIS/Papyrological Navigator Work Plan Details, PDF document pages 39-45

Source code

Not yet imported into the IDP SVN repository.

Related Milestones

Related tickets (query)

See also: NavigatorProgress

#2
Aggregation incomplete
#5
HGV publication numbers with spans being parsed incorrectly
#6
Synthesized HGV metadata must not be displayed as HGV without caveat
#8
Final page of search results shows incorrect summary information
#9
HGV Translation portlet's SOURCE link goes to HGV metadata
#10
DDb searches within Series that have NO VOLUMES are not working
#23
Papyrologists close down PN google doc
#36
Revisit and prepare Mapping file for Leuven
#37
Ensure that there is an image viewing tool for ISAW-hosted PN
#38
Text export option
#41
Entry with Unicode characters in PN
#42
Browsers and PN
#43
Standardize sorted display of publications in PN result sets
#44
Use of grc-Grek in PN display
#45
Missing texts, literary texts, collaboratiion with LDAB
#47
Results screen should display line numbers in regular numerical order
#48
Allow error reporting on every page of the PN.
#50
Only thumbnail images in PN
#54
spec storage requirements for PN/NS footprint at NYU/diglib
#56
Change publication search in PN to fall back to a prefix search
#69
consolidate/document IDP2 and Concordia PN tasks for PN programmer
#75
report on how PN gets IDP content now
#92
Institute maven build system for PN
#93
Set up NYU maven repository
#94
Reorganize PN projects in the svn repo
#95
Port pn-metadata-indexers to maven
#96
Port pn-ddbdp-indexers project to maven
#98
Document#parseName may be ambiguous/incorrect
#102
Number server indexing source fields
#103
Number server x is related to y functionality
#104
Publication Series Drop Down Menu Fix
#105
Links between Static and PN views
#107
XML filenames with a '+' literal are being incorrectly decoded
#108
PN hierarchal browsing
#109
PN linear browsing
#110
PN Tab overhaul
#112
PN Tabs --> Collapsible portlet view???
#113
PN user profiles
#114
Tab Reform
#115
DDBDP Search tab reform
#116
Numbers Search tab reform
#117
Combined and Rationalized Search tabs
#118
PN Tab overhaul: News & Updates
#119
Add Linkage to digitized editions elsewhere on web
#120
Link out to Trismegistos from PN
#123
PN to SoSOL linking
#125
clean, stable URLs for PN content
#126
GAWDly Atom feeds for PN content
#127
address assessment of other projects
#129
Window in PN for communications
#132
Port pn-numbers to maven
#135
Refactor tests for pn-ddbdp-indexers
#140
Columbia PN: make DEV public
#141
port pn-jndi to maven
#142
Port pn-ddbdp-portlets project to maven
#143
Port pn-metadata-portlets to maven
#144
Refactor transcoder for maven
#152
Port jetspeed-navigator to maven
#162
Set up data locations for PN
#164
Debug PN
#187
Sort out deployment method
#188
Greek search does not return highlighted results
#189
DDBDP info appears in HGV Translation portlet
#190
Unicode fraction display
#191
Lemmatized Searching not functionin gin NYU PN
#192
NYU PN: substring searching weirdness
#193
NYU PN vs Col PN different number of returns on same DDBDP search
#194
NYU PN HTML display of DDBDP texts
#195
NYU PN vs Col PN different number of returns on same APIS search
#196
NYU PN vs Col PN different number of returns on same Numbers search
#197
Unicode unclear fraction display
#213
DDbDP search page off by one
#221
Create second tomcat instance for PN
#227
Incorrect linkage
#228
Incorrect Display of Search Results
#229
Proximity Searching Errors
#230
The limiters AND and NOT apparently not working
#231
NYU PN: Search link on main page
#234
Duplicate HGV metadata
#236
"Clear Form" on DDbDP Search Page Disables "input in beta code"
#249
PN HGV metadata
#251
Leiden missing from Greek in "initial results"
#252
Metadata appearing in Greek text box on "initial results" page
#254
Incomplete Search Results (choice vs app)
#255
In IE Greek dropping out of PN
#266
PN user experience / information architecture: data layout
#267
PN user experience / information architecture: search forms
#268
PN user experience / information architecture: Search Results
#269
PN user experience / information architecture: 'Revise Search'
#270
PN user experience / information architecture: basic / advanced search forms
#283
PN HGV metadata "Publication" field
#289
Query Parser for search strings
#299
Check into impact of <lb type="worddiv">
#306
Add Latin lemmatized searching
#310
encoding error in Apis description (Qur'an)
#311
Normalize fonts for site
#312
Some Latin texts dropping out
#313
"Clear Form" not working in DDBDP Search
#314
Text of P.Wisc, II 59 missing from PN
#319
Modify XSLTs to generate new PN view pages
#321
Deploy new static views in place of PN views
#322
Unify metadata and full text search indices
#323
Unify search result views
#379
add "I'm interested" mechanism to SoSOL/PN for APIS records
#380
image viewer features
#381
incorporate BYU multispectral image viewing (manipulation?) into PN
#383
Greek Wildcard searching in PN not working
#389
New Image Server
#390
Reorganize PN XSLT to be separate from EpiDoc XSLT
#396
Concordia Testing
#428
Integrate APIS EpiDoc records into PN indexing workflow
#473
Surface old collection.xml equivalencies for posterity
#505
wrong URL for some XML components given in PN display when browsing by APIS or HGV
#506
APIS Translation (English) heading repeated