Annotated English Gigaword
- URL
- https://dss2.princeton.edu/data/69/
- Description
-
Adds automatically-generated syntactic and discourse structure annotation to English Gigaword (5th ed.) and also contains an API and tools for reading the datasets XML files.
Annotated English Gigaword contains the nearly ten million documents (over four billion words) of the original English Gigaword Fifth Edition from seven news sources:
- Agence France-Presse, English Service (afp_eng)
- Associated Press Worldstream, English Service (apw_eng)
- Central News Agency of Taiwan, English Service (cna_eng)
- Los Angeles Times/Washington Post Newswire Service (ltw_eng)
- Washington Post/Bloomberg Newswire Service (wpb_eng)
- New York Times Newswire Service (nyt_eng)
- Xinhua News Agency, English Service (xin_eng)
The following layers of annotation were added:
- Tokenized and segmented sentences
- Treebank-style constituent parse trees
- Syntactic dependency trees
- Named entities
- In-document coreference chains
- Format
- Single study
- Title
- Annotated English Gigaword
- Format
- Single study