Making social networks more human: A topological approach
Abstract
A key problem in social network analysis is to identify nonhuman interactions. State-of-the-art bot-detection systems like Botometer train machine-learning models on user-specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non-human activity from publicly available electronic-social-network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar-inspired bound; and then remove the violator-to-violator edges. We run our algorithm on multiple social networks and show that our Dunbar-inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter-2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5more »
- Authors:
-
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Univ. of New Mexico, Albuquerque, NM (United States)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1559509
- Report Number(s):
- SAND2019-9300J
Journal ID: ISSN 1932-1864; 678306
- Grant/Contract Number:
- AC04-94AL85000
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Statistical Analysis and Data Mining
- Additional Journal Information:
- Journal Name: Statistical Analysis and Data Mining; Journal ID: ISSN 1932-1864
- Publisher:
- Wiley
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Berry, Jonathan W., Phillips, Cynthia A., and Saia, Jared. Making social networks more human: A topological approach. United States: N. p., 2019.
Web. doi:10.1002/sam.11420.
Berry, Jonathan W., Phillips, Cynthia A., & Saia, Jared. Making social networks more human: A topological approach. United States. doi:https://doi.org/10.1002/sam.11420
Berry, Jonathan W., Phillips, Cynthia A., and Saia, Jared. Wed .
"Making social networks more human: A topological approach". United States. doi:https://doi.org/10.1002/sam.11420. https://www.osti.gov/servlets/purl/1559509.
@article{osti_1559509,
title = {Making social networks more human: A topological approach},
author = {Berry, Jonathan W. and Phillips, Cynthia A. and Saia, Jared},
abstractNote = {A key problem in social network analysis is to identify nonhuman interactions. State-of-the-art bot-detection systems like Botometer train machine-learning models on user-specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non-human activity from publicly available electronic-social-network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar-inspired bound; and then remove the violator-to-violator edges. We run our algorithm on multiple social networks and show that our Dunbar-inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter-2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5 or greater. Furthermore, after we remove the roughly 15 million violator-violator edges from the 1.2-billion-edge Twitter-2010 follower graph, 34% of the violator nodes experience a factor-of-two decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.},
doi = {10.1002/sam.11420},
journal = {Statistical Analysis and Data Mining},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {7}
}
Works referenced in this record:
The rise of social bots
journal, June 2016
- Ferrara, Emilio; Varol, Onur; Davis, Clayton
- Communications of the ACM, Vol. 59, Issue 7
Empirical Analysis of an Evolving Social Network
journal, January 2006
- Kossinets, G.
- Science, Vol. 311, Issue 5757
A Scalable Generative Graph Model with Community Structure
journal, January 2014
- Kolda, Tamara G.; Pinar, Ali; Plantenga, Todd
- SIAM Journal on Scientific Computing, Vol. 36, Issue 5
Natural Sorting
journal, June 1962
- Baer, Robert M.; Brock, Paul
- Journal of the Society for Industrial and Applied Mathematics, Vol. 10, Issue 2
Why Do Simple Algorithms for Triangle Enumeration Work in the Real World?
journal, February 2015
- Berry, Jonathan W.; Fostvedt, Luke A.; Nordman, Daniel J.
- Internet Mathematics, Vol. 11, Issue 6
Algorithmic Complexity of Power Law Networks
conference, January 2016
- Brach, Paweł; Cygan, Marek; Łącki, Jakub
- Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms
Social cognition on the Internet: testing constraints on social network size
journal, August 2012
- Dunbar, R. I. M.
- Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 367, Issue 1599
On power-law relationships of the Internet topology
journal, October 1999
- Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
- ACM SIGCOMM Computer Communication Review, Vol. 29, Issue 4
Graphs over time: densification laws, shrinking diameters and possible explanations
conference, January 2005
- Leskovec, Jure; Kleinberg, Jon; Faloutsos, Christos
- Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05
Power-Law Distributions in Empirical Data
journal, November 2009
- Clauset, Aaron; Shalizi, Cosma Rohilla; Newman, M. E. J.
- SIAM Review, Vol. 51, Issue 4
Cooperative Computing for Autonomous Data Centers
conference, May 2015
- Berry, Jonathan; Collins, Michael; Kearns, Aaron
- 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
BotOrNot: A System to Evaluate Social Bots
conference, January 2016
- Davis, Clayton Allen; Varol, Onur; Ferrara, Emilio
- Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion
Tolerating the community detection resolution limit with edge weighting
journal, May 2011
- Berry, Jonathan W.; Hendrickson, Bruce; LaViolette, Randall A.
- Physical Review E, Vol. 83, Issue 5
DeBot: Twitter Bot Detection via Warped Correlation
conference, December 2016
- Chavoshi, Nikan; Hamooni, Hossein; Mueen, Abdullah
- 2016 IEEE 16th International Conference on Data Mining (ICDM)
Power-Law Distribution of the World Wide Web
journal, March 2000
- Adamic, Lada A.; Huberman, Bernardo A.; Barabási, A. -L.
- Science, Vol. 287, Issue 5461
On Information and Sufficiency
journal, March 1951
- Kullback, S.; Leibler, R. A.
- The Annals of Mathematical Statistics, Vol. 22, Issue 1
On power-law relationships of the Internet topology
conference, January 1999
- Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
- Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication - SIGCOMM '99