LORELEI Entity Detection and Linking Knowledge Base
- Resource
- URL
- https://dss2.princeton.edu/data/303/
- Blurb
-
Contains the full LORELEI Entity Detection and Linking (EDL) Knowledge Base (KB) used for all LORELEI Representative Language and Incident Language Pack entity linking annotation. The KB content was drawn from GeoNames, the CIA World Leaders List and the CIA World Factbook and was supplemented with manually-created KB entries developed specifically for LORELEI data.
The LORELEI (Low Resource Languages for Emergent Incidents) Program was concerned with building human language technology for low resource languages in the context of emergent situations like natural disasters or disease outbreaks. Linguistic resources for LORELEI include Representative Language Packs and Incident Language Packs for over two dozen low resource languages, comprising data, annotations, basic natural language processing tools, lexicons and grammatical resources. Representative languages were selected to provide broad typological coverage, while incident languages were selected to evaluate system performance on a language whose identity was disclosed at the start of the evaluation.
This corpus is comprised of an English knowledge base to support the EDL task in LORELEI for four entity types: geo-political entities (GPE), locations, including facilities (LOC), persons (PER) and organizations (ORG). There are four inputs to the KB, each designated by a unique "origin" code in the KB, as follows: GPE and LOC entities from a 2015 snapshot of GeoNames, PER entities from the CIA World Leaders List dated May 2015, ORG entities from Appendix B of the CIA World Factbook downloaded in 2015, and additional entities manually created by LDC for each of the representative and incident languages.
The KB contains a total of 10,216,832 entities and consists of three tab-delimited files, which are linked via the entityid in each entry. More information is contained in the included documentation.
- Link time
- 2020-05-22 18:23:00 UTC
- Sample
- Principal investigator
- Producer
- Distributor
- Version
- More detail URL
- Resource type
- Single study
- Subjects
- Art & Culture
- Regions
- Countries