Acquisition of Knowledge of Vernacular and Vague Place Names
Chris Jones et al.
- context, again, is information retrieval
- Place names used to specify location on internet (timetables, routing instructions etc)
- Common cause of query failure = use of vernacular names
- Gazeteers tends to reflect an administrative geography
- Vernacular place names
- not recorded in conventional gazetteers
- often no precise boundary
- Need to acquire this knowledge
- Source of place name knowledge
- gazetteers
- maps and terrain models with place annotation
- association of names with terrain features
- difficult to derive extent
- people
- interviews/questionnaires - traiditional methods inefficient, but web offers potential
- text documents
- detailed descriptions difficult to interpret automatically
- associations between places in text may be useful
- Exploiting the web for place name knowledge
- web mining / harvesting
- find the spatial extent of vague places in terms of the places that lie within them
- discover vague place names through named entity analysis
- web questionnaires
- elicit personal knowledge of vague places (? and vague spatial language)
- web mining / harvesting
- web mining
- documents that refer to vague places may also refer to more precise places inside them
- places that occur frequently in association with a target named place ay have higer chance of being inside
- procedure
- web search engine queries referring to a target place
- parse resulting high rank pages for other placenames
- geocode them
- use modeling to determine association, coherance, gravity etc. - density surfaces, for example
- formulating appropriate web queries
- region/place only (e.g., Rocky mountains)
- region + concept (e.g. hotels in cotsworlds
- tends to retrieve directory pages listing places associated with the target place
- region and lexical pattern (trigger phrase), e.g., "Midwest towns such as"
- you hope in result docs, that phrase will be followed by the region name or other places associated with what you're looking for
- region + concept produces highest numbers of co-associated places in top ranking documents
- geoparsing
- use named entity recognition (NER) methods to identify names
- gazetteers used to recognize place names
- distinguish between geographical and non-geographic uses - many place names occur in organization names and in people's names
- use rules/patterns to identify these cases, e.g.,
- <forename><placename> indicates a person's name: John London
- "ask john frank"
- use rules/patterns to identify these cases, e.g.,
- geocoding
- assigning coordinate to name
- multiple occurences major problem
- more sophisitcate: search for co-occurence of parent and neighboring places that establish uniqueness
- crude approach: default interpretation -- assume most commonly occurring instance
- web questionnaires
- elicit personal knowledge of individuals
- various options/approaches
- OS-funded project starts 2007
- elicit personal knowledge of individuals
- future work
- web mining has potential, but currently flawed
- automate thresholding of surfaces
- quality of geoparsing+geocoding needs to be imporved
- problem of areas that have no settlements
- apply web questionnaire methods
- evaluate different methods of elicitation and geometric modelling
- use as "ground truth" for web mining
- web questionsare for knowledge of vague spatial language (west, near, between, ...)?
- terrain models to extract topographic features
- TRIPOD - European project for image search enginges - interpret and generate geo descprions of archive images and images from location aware cameras
- web mining has potential, but currently flawed
