Acquisition of Knowledge of Vernacular and Vague Place Names

Chris Jones et al.

  • context, again, is information retrieval
  • Place names used to specify location on internet (timetables, routing instructions etc)
  • Common cause of query failure = use of vernacular names
  • Gazeteers tends to reflect an administrative geography
  • Vernacular place names
    • not recorded in conventional gazetteers
    • often no precise boundary
  • Need to acquire this knowledge
  • Source of place name knowledge
    • gazetteers
    • maps and terrain models with place annotation
      • association of names with terrain features
      • difficult to derive extent
    • people
      • interviews/questionnaires - traiditional methods inefficient, but web offers potential
    • text documents
      • detailed descriptions difficult to interpret automatically
      • associations between places in text may be useful
  • Exploiting the web for place name knowledge
    • web mining / harvesting
      • find the spatial extent of vague places in terms of the places that lie within them
      • discover vague place names through named entity analysis
    • web questionnaires
      • elicit personal knowledge of vague places (? and vague spatial language)
  • web mining
    • documents that refer to vague places may also refer to more precise places inside them
    • places that occur frequently in association with a target named place ay have higer chance of being inside
    • procedure
      • web search engine queries referring to a target place
      • parse resulting high rank pages for other placenames
      • geocode them
      • use modeling to determine association, coherance, gravity etc. - density surfaces, for example
    • formulating appropriate web queries
      • region/place only (e.g., Rocky mountains)
      • region + concept (e.g. hotels in cotsworlds
        • tends to retrieve directory pages listing places associated with the target place
      • region and lexical pattern (trigger phrase), e.g., "Midwest towns such as"
        • you hope in result docs, that phrase will be followed by the region name or other places associated with what you're looking for
      • region + concept produces highest numbers of co-associated places in top ranking documents
  • geoparsing
    • use named entity recognition (NER) methods to identify names
    • gazetteers used to recognize place names
    • distinguish between geographical and non-geographic uses - many place names occur in organization names and in people's names
      • use rules/patterns to identify these cases, e.g.,
        • <forename><placename> indicates a person's name: John London
        • "ask john frank"
  • geocoding
    • assigning coordinate to name
    • multiple occurences major problem
      • more sophisitcate: search for co-occurence of parent and neighboring places that establish uniqueness
      • crude approach: default interpretation -- assume most commonly occurring instance
  • web questionnaires
    • elicit personal knowledge of individuals
      • various options/approaches
    • OS-funded project starts 2007
  • future work
    • web mining has potential, but currently flawed
      • automate thresholding of surfaces
      • quality of geoparsing+geocoding needs to be imporved
      • problem of areas that have no settlements
    • apply web questionnaire methods
      • evaluate different methods of elicitation and geometric modelling
      • use as "ground truth" for web mining
    • web questionsare for knowledge of vague spatial language (west, near, between, ...)?
    • terrain models to extract topographic features
    • TRIPOD - European project for image search enginges - interpret and generate geo descprions of archive images and images from location aware cameras