Making social networks more human: A topological approach
Abstract
A key problem in social network analysis is to identify nonhuman interactions. Stateoftheart botdetection systems like Botometer train machinelearning models on userspecific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of nonhuman activity from publicly available electronicsocialnetwork datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as โviolatorsโ if the sum of these edge strengths exceeds a Dunbarinspired bound; and then remove the violatortoviolator edges. We run our algorithm on multiple social networks and show that our Dunbarinspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5more »
 Authors:

 Sandia National Lab. (SNLNM), Albuquerque, NM (United States)
 Univ. of New Mexico, Albuquerque, NM (United States)
 Publication Date:
 Research Org.:
 Sandia National Lab. (SNLNM), Albuquerque, NM (United States)
 Sponsoring Org.:
 USDOE National Nuclear Security Administration (NNSA)
 OSTI Identifier:
 1559509
 Report Number(s):
 SAND20199300J
Journal ID: ISSN 19321864; 678306
 Grant/Contract Number:
 AC0494AL85000
 Resource Type:
 Accepted Manuscript
 Journal Name:
 Statistical Analysis and Data Mining
 Additional Journal Information:
 Journal Name: Statistical Analysis and Data Mining; Journal ID: ISSN 19321864
 Publisher:
 Wiley
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING
Citation Formats
Berry, Jonathan W., Phillips, Cynthia A., and Saia, Jared. Making social networks more human: A topological approach. United States: N. p., 2019.
Web. doi:10.1002/sam.11420.
Berry, Jonathan W., Phillips, Cynthia A., & Saia, Jared. Making social networks more human: A topological approach. United States. https://doi.org/10.1002/sam.11420
Berry, Jonathan W., Phillips, Cynthia A., and Saia, Jared. Wed .
"Making social networks more human: A topological approach". United States. https://doi.org/10.1002/sam.11420. https://www.osti.gov/servlets/purl/1559509.
@article{osti_1559509,
title = {Making social networks more human: A topological approach},
author = {Berry, Jonathan W. and Phillips, Cynthia A. and Saia, Jared},
abstractNote = {A key problem in social network analysis is to identify nonhuman interactions. Stateoftheart botdetection systems like Botometer train machinelearning models on userspecific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of nonhuman activity from publicly available electronicsocialnetwork datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as โviolatorsโ if the sum of these edge strengths exceeds a Dunbarinspired bound; and then remove the violatortoviolator edges. We run our algorithm on multiple social networks and show that our Dunbarinspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5 or greater. Furthermore, after we remove the roughly 15 million violatorviolator edges from the 1.2billionedge Twitter2010 follower graph, 34% of the violator nodes experience a factoroftwo decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.},
doi = {10.1002/sam.11420},
journal = {Statistical Analysis and Data Mining},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {7}
}
Web of Science
Figures / Tables:
Works referenced in this record:
The rise of social bots
journal, June 2016
 Ferrara, Emilio; Varol, Onur; Davis, Clayton
 Communications of the ACM, Vol. 59, Issue 7
Empirical Analysis of an Evolving Social Network
journal, January 2006
 Kossinets, G.
 Science, Vol. 311, Issue 5757
A Scalable Generative Graph Model with Community Structure
journal, January 2014
 Kolda, Tamara G.; Pinar, Ali; Plantenga, Todd
 SIAM Journal on Scientific Computing, Vol. 36, Issue 5
Natural Sorting
journal, June 1962
 Baer, Robert M.; Brock, Paul
 Journal of the Society for Industrial and Applied Mathematics, Vol. 10, Issue 2
Why Do Simple Algorithms for Triangle Enumeration Work in the Real World?
journal, February 2015
 Berry, Jonathan W.; Fostvedt, Luke A.; Nordman, Daniel J.
 Internet Mathematics, Vol. 11, Issue 6
Algorithmic Complexity of Power Law Networks
conference, January 2016
 Brach, Paweล; Cygan, Marek; ลฤ cki, Jakub
 Proceedings of the TwentySeventh Annual ACMSIAM Symposium on Discrete Algorithms
On powerlaw relationships of the Internet topology
conference, January 1999
 Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication  SIGCOMM '99
Social cognition on the Internet: testing constraints on social network size
journal, August 2012
 Dunbar, R. I. M.
 Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 367, Issue 1599
On powerlaw relationships of the Internet topology
journal, October 1999
 Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
 ACM SIGCOMM Computer Communication Review, Vol. 29, Issue 4
Graphs over time: densification laws, shrinking diameters and possible explanations
conference, January 2005
 Leskovec, Jure; Kleinberg, Jon; Faloutsos, Christos
 Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining  KDD '05
PowerLaw Distributions in Empirical Data
journal, November 2009
 Clauset, Aaron; Shalizi, Cosma Rohilla; Newman, M. E. J.
 SIAM Review, Vol. 51, Issue 4
Cooperative Computing for Autonomous Data Centers
conference, May 2015
 Berry, Jonathan; Collins, Michael; Kearns, Aaron
 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
BotOrNot: A System to Evaluate Social Bots
conference, January 2016
 Davis, Clayton Allen; Varol, Onur; Ferrara, Emilio
 Proceedings of the 25th International Conference Companion on World Wide Web  WWW '16 Companion
Tolerating the community detection resolution limit with edge weighting
journal, May 2011
 Berry, Jonathan W.; Hendrickson, Bruce; LaViolette, Randall A.
 Physical Review E, Vol. 83, Issue 5
DeBot: Twitter Bot Detection via Warped Correlation
conference, December 2016
 Chavoshi, Nikan; Hamooni, Hossein; Mueen, Abdullah
 2016 IEEE 16th International Conference on Data Mining (ICDM)
PowerLaw Distribution of the World Wide Web
journal, March 2000
 Adamic, Lada A.; Huberman, Bernardo A.; Barabรกsi, A. L.
 Science, Vol. 287, Issue 5461
On Information and Sufficiency
journal, March 1951
 Kullback, S.; Leibler, R. A.
 The Annals of Mathematical Statistics, Vol. 22, Issue 1
Fast maximum clique algorithms for large graphs
conference, April 2014
 Rossi, Ryan A.; Gleich, David F.; Gebremedhin, Assefaw H.
 WWW '14: 23rd International World Wide Web Conference, Proceedings of the 23rd International Conference on World Wide Web
Why do simple algorithms for triangle enumeration work in the real world?
conference, January 2014
 Berry, Jonathan W.; Fostvedt, Luke K.; Nordman, Daniel J.
 ITCS'14: Innovations in Theoretical Computer Science, Proceedings of the 5th conference on Innovations in theoretical computer science
Powerlaw distributions in empirical data
text, January 2018
 Clauset, Aaron; Shalizi, Cosma; Newman, M. E. J.
 Figshare
Powerlaw distributions in empirical data
text, January 2018
 Clauset, Aaron; Shalizi, Cosma; Newman, M. E. J.
 Figshare
On PowerLaw Relationships of the Internet Topology
text, January 1984
 Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
 Carnegie Mellon University
On PowerLaw Relationships of the Internet Topology
text, January 1984
 Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
 Carnegie Mellon University
Powerlaw distributions in empirical data
text, January 2007
 Clauset, Aaron; Shalizi, Cosma Rohilla; Newman, M. E. J.
 arXiv
A Scalable Generative Graph Model with Community Structure
text, January 2013
 Kolda, Tamara G.; Pinar, Ali; Plantenga, Todd
 arXiv
Figures / Tables found in this record: