COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes
- URL
- https://www.openicpsr.org/openicpsr/project/120321
- Description
This project aims to present a large dataset for researchers to discover public conversation on Twitter surrounding the COVID-19 pandemic. From 28 January 2020 to 1 September 2021, we collected over 198 million Twitter posts from more than 25 million unique users using four keywords: “corona”, “wuhan”, “nCov” and “covid”. Leveraging topic modeling techniques and pre-trained machine learning-based emotion analytic algorithms, we labeled each tweet with seventeen semantic attributes, including a) ten binary attributes indicating the tweet’s relevance or irrelevance to the top ten detected topics, b) five quantitative emotion attributes indicating the degree of intensity of the valence or sentiment (from 0: very negative to 1: very positive), and the degree of intensity of fear, anger, happiness and sadness emotions (from 0: not at all to 1: extremely intense), and c) two qualitative attributes indicating the sentiment category (very negative, negative, neutral or mixed, positive, very positive) and the dominant emotion category (fear, anger, happiness, sadness, no specific emotion) the tweet is mainly expressing.
- Sample
- Format
- Series - ongoing
- Country
- Multinational/Crossnational
- United States
- Title
- COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes
- Format
- Series - ongoing