DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Making social networks more human: A topological approach

Journal Article · · Statistical Analysis and Data Mining
DOI: https://doi.org/10.1002/sam.11420 · OSTI ID:1559509

A key problem in social network analysis is to identify nonhuman interactions. State-of-the-art bot-detection systems like Botometer train machine-learning models on user-specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non-human activity from publicly available electronic-social-network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar-inspired bound; and then remove the violator-to-violator edges. We run our algorithm on multiple social networks and show that our Dunbar-inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter-2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5 or greater. Furthermore, after we remove the roughly 15 million violator-violator edges from the 1.2-billion-edge Twitter-2010 follower graph, 34% of the violator nodes experience a factor-of-two decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.

Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1559509
Report Number(s):
SAND2019--9300J; 678306
Journal Information:
Statistical Analysis and Data Mining, Journal Name: Statistical Analysis and Data Mining; ISSN 1932-1864
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (26)

Why Do Simple Algorithms for Triangle Enumeration Work in the Real World? journal February 2015
Social cognition on the Internet: testing constraints on social network size journal August 2012
Tolerating the community detection resolution limit with edge weighting journal May 2011
DeBot: Twitter Bot Detection via Warped Correlation conference December 2016
Cooperative Computing for Autonomous Data Centers conference May 2015
Empirical Analysis of an Evolving Social Network journal January 2006
Power-Law Distribution of the World Wide Web journal March 2000
Natural Sorting journal June 1962
Power-Law Distributions in Empirical Data journal November 2009
Algorithmic Complexity of Power Law Networks conference January 2016
A Scalable Generative Graph Model with Community Structure journal January 2014
Graphs over time: densification laws, shrinking diameters and possible explanations
  • Leskovec, Jure; Kleinberg, Jon; Faloutsos, Christos
  • Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05 https://doi.org/10.1145/1081870.1081893
conference January 2005
Why do simple algorithms for triangle enumeration work in the real world?
  • Berry, Jonathan W.; Fostvedt, Luke K.; Nordman, Daniel J.
  • ITCS'14: Innovations in Theoretical Computer Science, Proceedings of the 5th conference on Innovations in theoretical computer science https://doi.org/10.1145/2554797.2554819
conference January 2014
Fast maximum clique algorithms for large graphs
  • Rossi, Ryan A.; Gleich, David F.; Gebremedhin, Assefaw H.
  • WWW '14: 23rd International World Wide Web Conference, Proceedings of the 23rd International Conference on World Wide Web https://doi.org/10.1145/2567948.2577283
conference April 2014
The rise of social bots journal June 2016
BotOrNot: A System to Evaluate Social Bots conference January 2016
On power-law relationships of the Internet topology
  • Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
  • Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication - SIGCOMM '99 https://doi.org/10.1145/316188.316229
conference January 1999
On power-law relationships of the Internet topology journal October 1999
Power-law distributions in empirical data text January 2018
Power-law distributions in empirical data text January 2018
On Power-Law Relationships of the Internet Topology text January 1984
On Power-Law Relationships of the Internet Topology text January 1984
On Information and Sufficiency journal March 1951
Power-law distributions in empirical data text January 2007
A Scalable Generative Graph Model with Community Structure text January 2013
The Rise of Social Bots text January 2014