skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian

Abstract

This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 30,521 posts annotated with Cohen’s kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,749 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and release the bestperforming embeddings trained on 3.2B tokens in Russian VKontakte posts.

Authors:
 [1];  [1];  [1];  [2];  [3];  [1]
  1. University of Massachusetts at Lowell
  2. BATTELLE (PACIFIC NW LAB)
  3. Dartmouth College
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1529951
Report Number(s):
PNNL-SA-134041
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), August, 2018, Santa Fe, NM
Country of Publication:
United States
Language:
English
Subject:
data collection, sentiment analysis, machine learning, natural language processing

Citation Formats

Rogers, Anna, Romanov, Alexey, Rumshisky, Anna, Volkova, Svitlana, Gronas, Mikhail, and Gribov, Alex. RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. United States: N. p., 2018. Web.
Rogers, Anna, Romanov, Alexey, Rumshisky, Anna, Volkova, Svitlana, Gronas, Mikhail, & Gribov, Alex. RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. United States.
Rogers, Anna, Romanov, Alexey, Rumshisky, Anna, Volkova, Svitlana, Gronas, Mikhail, and Gribov, Alex. Mon . "RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian". United States.
@article{osti_1529951,
title = {RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian},
author = {Rogers, Anna and Romanov, Alexey and Rumshisky, Anna and Volkova, Svitlana and Gronas, Mikhail and Gribov, Alex},
abstractNote = {This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 30,521 posts annotated with Cohen’s kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,749 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and release the bestperforming embeddings trained on 3.2B tokens in Russian VKontakte posts.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {8}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: