Concretely Annotated New York Times
- URL
- https://dss2.princeton.edu/data/3324/
- Description
-
Adds multiple kinds and instances of automatically-generated syntactic, semantic and coreference annotations to The New York Times Annotated Corpus. Concrete is a schema for representing structured, hierarchical and overlapping linguistic annotations. This release provides multiple tool outputs producing the same annotation types as different annotation theories under a shared tokenization. Concretely Annotated New York Times contains all of the 1.8 million articles in The New York Times Annotated Corpus. Those articles were written and published by the New York Times between January 1, 1987 and June 19, 2007; the 2008 corpus also includes metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at nytimes.com. The following layers of annotation were added by processing the articles under the Concrete schema:
- Segmented sentences and Penn Treebank-style tokenized words
- Treebank-style constituent parse trees
- Four different syntactic dependency trees
- Named entities
- Part of speech tags
- Lemmas
- In-document entity coreference chains
- Three different frame semantic parses
- Format
- Single study
- Country
- United States
- Title
- Concretely Annotated New York Times
- Format
- Single study