DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data

Abstract

Summary There are many data sources that produce large volumes of data. The Big Data nature requires new distributed processing approaches to extract the valuable information. Real‐time sentiment analysis is one of the most demanding research areas that requires powerful Big Data analytics tools such as Spark. Prior literature survey work has shown that, though there are many conventional sentiment analysis researches, there are only few works realizing sentiment analysis in real time. One major point that affects the quality of real‐time sentiment analysis is the confidence of the generated data. In more clear terms, it is a valuable research question to determine whether the owner that generates sentiment is genuine or not. Since data generated by fake personalities may decrease accuracy of the outcome, a smart/intelligent service that can identify the source of data is one of the key points in the analysis. In this context, we include a fake account detection service to the proposed framework. Both sentiment analysis and fake account detection systems are trained and tested using Naïve Bayes model from Apache Spark's machine learning library. The developed system consists of four integrated software components, ie, (i) machine learning and streaming service for sentiment prediction, (ii)more » a Twitter streaming service to retrieve tweets, (iii) a Twitter fake account detection service to assess the owner of the retrieved tweet, and (iv) a real‐time reporting and dashboard component to visualize the results of sentiment analysis. The sentiment classification performances of the system for offline and real‐time modes are 86.77% and 80.93%, respectively.« less

Authors:
ORCiD logo [1]
  1. Department of Software Engineering, Faculty of Technology Manisa Celal Bayar University Manisa Turkey
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
1529994
Resource Type:
Publisher's Accepted Manuscript
Journal Name:
Software, Practice and Experience
Additional Journal Information:
Journal Name: Software, Practice and Experience Journal Volume: 49 Journal Issue: 9; Journal ID: ISSN 0038-0644
Publisher:
Wiley Blackwell (John Wiley & Sons)
Country of Publication:
United Kingdom
Language:
English

Citation Formats

Kılınç, Deniz. A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data. United Kingdom: N. p., 2019. Web. doi:10.1002/spe.2724.
Kılınç, Deniz. A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data. United Kingdom. https://doi.org/10.1002/spe.2724
Kılınç, Deniz. Thu . "A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data". United Kingdom. https://doi.org/10.1002/spe.2724.
@article{osti_1529994,
title = {A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data},
author = {Kılınç, Deniz},
abstractNote = {Summary There are many data sources that produce large volumes of data. The Big Data nature requires new distributed processing approaches to extract the valuable information. Real‐time sentiment analysis is one of the most demanding research areas that requires powerful Big Data analytics tools such as Spark. Prior literature survey work has shown that, though there are many conventional sentiment analysis researches, there are only few works realizing sentiment analysis in real time. One major point that affects the quality of real‐time sentiment analysis is the confidence of the generated data. In more clear terms, it is a valuable research question to determine whether the owner that generates sentiment is genuine or not. Since data generated by fake personalities may decrease accuracy of the outcome, a smart/intelligent service that can identify the source of data is one of the key points in the analysis. In this context, we include a fake account detection service to the proposed framework. Both sentiment analysis and fake account detection systems are trained and tested using Naïve Bayes model from Apache Spark's machine learning library. The developed system consists of four integrated software components, ie, (i) machine learning and streaming service for sentiment prediction, (ii) a Twitter streaming service to retrieve tweets, (iii) a Twitter fake account detection service to assess the owner of the retrieved tweet, and (iv) a real‐time reporting and dashboard component to visualize the results of sentiment analysis. The sentiment classification performances of the system for offline and real‐time modes are 86.77% and 80.93%, respectively.},
doi = {10.1002/spe.2724},
journal = {Software, Practice and Experience},
number = 9,
volume = 49,
place = {United Kingdom},
year = {2019},
month = {6}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1002/spe.2724

Citation Metrics:
Cited by: 4 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Big Data Stream Analytics for Near Real-Time Sentiment Analysis
journal, January 2015

  • Cheng, Otto K. M.; Lau, Raymond
  • Journal of Computer and Communications, Vol. 03, Issue 05
  • DOI: 10.4236/jcc.2015.35024

Twitter fake account detection
conference, October 2017

  • Ersahin, Buket; Aktas, Ozlem; Kilinc, Deniz
  • 2017 International Conference on Computer Science and Engineering (UBMK)
  • DOI: 10.1109/UBMK.2017.8093420

SentiWordNet for New Language: Automatic Translation Approach
conference, January 2016

  • Ucan, Alaettin; Naderalvojoud, Behzad; Sezer, Ebru Akcapinar
  • 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)
  • DOI: 10.1109/SITIS.2016.57

Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media
journal, January 2017


TTC-3600: A new benchmark dataset for Turkish text categorization
journal, December 2015

  • Kılınç, Deniz; Özçift, Akın; Bozyigit, Fatma
  • Journal of Information Science, Vol. 43, Issue 2
  • DOI: 10.1177/0165551515620551

Sentiment Analysis and Opinion Mining
journal, May 2012


Big data analytics on Apache Spark
journal, October 2016

  • Salloum, Salman; Dautov, Ruslan; Chen, Xiaojun
  • International Journal of Data Science and Analytics, Vol. 1, Issue 3-4
  • DOI: 10.1007/s41060-016-0027-9

Naive Bayes classifier for continuous variables using novel method (NBC4D) and distributions
conference, June 2014

  • Yildirim, Pelin; Birant, Derya
  • 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings
  • DOI: 10.1109/INISTA.2014.6873605

Statistical Features-Based Real-Time Detection of Drifted Twitter Spam
journal, April 2017

  • Chen, Chao; Wang, Yu; Zhang, Jun
  • IEEE Transactions on Information Forensics and Security, Vol. 12, Issue 4
  • DOI: 10.1109/TIFS.2016.2621888

Text Mining Analysis in Turkish Language Using Big Data Tools
conference, June 2016

  • Cakir, Mehmet Ulas; Guldamlasioglu, Seren
  • 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC)
  • DOI: 10.1109/COMPSAC.2016.203

Data analytic on diabetic awareness with Hadoop streaming using map reduce in python
conference, October 2016

  • Ramsingh, J.; Bhuvaneswari, V.
  • 2016 IEEE International Conference on Advances in Computer Applications (ICACA)
  • DOI: 10.1109/ICACA.2016.7887979

Applying spark based machine learning model on streaming big data for health status prediction
journal, January 2018


Real-Time Sentiment Prediction on Streaming Social Network Data Using In-Memory Processing
conference, February 2017

  • Nirmal, V. Jude; Amalarethinam, D. I. George
  • 2017 World Congress on Computing and Communication Technologies (WCCCT)
  • DOI: 10.1109/WCCCT.2016.26

A survey on platforms for big data analytics
journal, October 2014


Spammers Are Becoming "Smarter" on Twitter
journal, March 2016


Distributed real-time sentiment analysis for big data social streams
conference, November 2014

  • Rahnama, Amir Hossein Akhavan
  • 2014 International Conference on Control, Decision and Information Technologies (CoDIT)
  • DOI: 10.1109/CoDIT.2014.6996998

Asymmetric self-learning for tackling Twitter Spam Drift
conference, April 2015

  • Chen, Chao; Zhang, Jun; Xiang, Yang
  • IEEE INFOCOM 2015 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
  • DOI: 10.1109/INFCOMW.2015.7179386

A comparison study on active learning integrated ensemble approaches in sentiment analysis
journal, January 2017


Feature hashing for large scale multitask learning
conference, January 2009

  • Weinberger, Kilian; Dasgupta, Anirban; Langford, John
  • Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09
  • DOI: 10.1145/1553374.1553516

Uncovering social network Sybils in the wild
journal, February 2014

  • Yang, Zhi; Wilson, Christo; Wang, Xiao
  • ACM Transactions on Knowledge Discovery from Data, Vol. 8, Issue 1
  • DOI: 10.1145/2556609

Detecting spammers on social networks
conference, January 2010

  • Stringhini, Gianluca; Kruegel, Christopher; Vigna, Giovanni
  • Proceedings of the 26th Annual Computer Security Applications Conference on - ACSAC '10
  • DOI: 10.1145/1920261.1920263

Scalable and Real-Time Sentiment Analysis of Twitter Data
conference, December 2016

  • Karanasou, Maria; Ampla, Anneta; Doulkeridis, Christos
  • 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)
  • DOI: 10.1109/ICDMW.2016.0138

Clash of the titans: MapReduce vs. Spark for large scale data analytics
journal, September 2015

  • Shi, Juwei; Qiu, Yunjie; Minhas, Umar Farooq
  • Proceedings of the VLDB Endowment, Vol. 8, Issue 13
  • DOI: 10.14778/2831360.2831365

Studying User Footprints in Different Online Social Networks
conference, August 2012

  • Malhotra, A.; Totti, L.; Meira, W.
  • 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
  • DOI: 10.1109/ASONAM.2012.184

Term-weighting approaches in automatic text retrieval
journal, January 1988


FakeBook: Detecting Fake Profiles in On-Line Social Networks
conference, August 2012

  • Conti, M.; Poovendran, R.; Secchiero, M.
  • 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
  • DOI: 10.1109/ASONAM.2012.185