Skip to main content

LORELEI Oromo Incident Language Pack

Resource
URL
https://dss2.princeton.edu/data/304/
Blurb

Oromo is a Cushitic language spoken in Ethiopia, Kenya, Somalia and Egypt, and it is the third largest language in Africa. Data was collected in the following genres: news, social network, weblog, newsgroup, discussion forum, and reference material. Entity detection and linking annotation identified entities to be detected by systems for scoring purposes. Situation frame analysis was designed to extract basic information about needs and relevant issues for planning a disaster response effort. Also included in this release are lexical and grammatical resources as well as three tools: two to recreate original source data from the processed XML material and the other to condition text data users download from Twitter.  Monolingual, parallel and comparable text are presented in XML with associated dtds. Entity Detection and Linking and Situation Frame annotation data is presented as tab delimited files. All text is UTF-8 encoded. The knowledge base for entity linking annotation in this corpus and all LORELEI Representative Language and Incident Language Packs is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10).

Link time
2020-05-22 18:28:00 UTC
Sample
Principal investigator
Producer
Distributor
Version
More detail URL
Resource type
Single study
Subjects
  • Art & Culture
Regions
  • Africa
Countries
  • Egypt
  • Ethiopia
  • Kenya
  • Somalia