skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

This content will become publicly available on July 24, 2020

Title: Making social networks more human: A topological approach

Abstract

A key problem in social network analysis is to identify nonhuman interactions. State-of-the-art bot-detection systems like Botometer train machine-learning models on user-specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non-human activity from publicly available electronic-social-network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar-inspired bound; and then remove the violator-to-violator edges. We run our algorithm on multiple social networks and show that our Dunbar-inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter-2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5more » or greater. Furthermore, after we remove the roughly 15 million violator-violator edges from the 1.2-billion-edge Twitter-2010 follower graph, 34% of the violator nodes experience a factor-of-two decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.« less

Authors:
ORCiD logo [1];  [1];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Univ. of New Mexico, Albuquerque, NM (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1559509
Report Number(s):
SAND2019-9300J
Journal ID: ISSN 1932-1864; 678306
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Accepted Manuscript
Journal Name:
Statistical Analysis and Data Mining
Additional Journal Information:
Journal Name: Statistical Analysis and Data Mining; Journal ID: ISSN 1932-1864
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Berry, Jonathan W., Phillips, Cynthia A., and Saia, Jared. Making social networks more human: A topological approach. United States: N. p., 2019. Web. doi:10.1002/sam.11420.
Berry, Jonathan W., Phillips, Cynthia A., & Saia, Jared. Making social networks more human: A topological approach. United States. doi:10.1002/sam.11420.
Berry, Jonathan W., Phillips, Cynthia A., and Saia, Jared. Wed . "Making social networks more human: A topological approach". United States. doi:10.1002/sam.11420.
@article{osti_1559509,
title = {Making social networks more human: A topological approach},
author = {Berry, Jonathan W. and Phillips, Cynthia A. and Saia, Jared},
abstractNote = {A key problem in social network analysis is to identify nonhuman interactions. State-of-the-art bot-detection systems like Botometer train machine-learning models on user-specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non-human activity from publicly available electronic-social-network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar-inspired bound; and then remove the violator-to-violator edges. We run our algorithm on multiple social networks and show that our Dunbar-inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter-2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5 or greater. Furthermore, after we remove the roughly 15 million violator-violator edges from the 1.2-billion-edge Twitter-2010 follower graph, 34% of the violator nodes experience a factor-of-two decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.},
doi = {10.1002/sam.11420},
journal = {Statistical Analysis and Data Mining},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on July 24, 2020
Publisher's Version of Record

Save / Share:

Works referenced in this record:

The rise of social bots
journal, June 2016

  • Ferrara, Emilio; Varol, Onur; Davis, Clayton
  • Communications of the ACM, Vol. 59, Issue 7
  • DOI: 10.1145/2818717

Empirical Analysis of an Evolving Social Network
journal, January 2006


A Scalable Generative Graph Model with Community Structure
journal, January 2014

  • Kolda, Tamara G.; Pinar, Ali; Plantenga, Todd
  • SIAM Journal on Scientific Computing, Vol. 36, Issue 5
  • DOI: 10.1137/130914218

Natural Sorting
journal, June 1962

  • Baer, Robert M.; Brock, Paul
  • Journal of the Society for Industrial and Applied Mathematics, Vol. 10, Issue 2
  • DOI: 10.1137/0110021

Why Do Simple Algorithms for Triangle Enumeration Work in the Real World?
journal, February 2015


Algorithmic Complexity of Power Law Networks
conference, January 2016

  • Brach, Paweł; Cygan, Marek; Łącki, Jakub
  • Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms
  • DOI: 10.1137/1.9781611974331.ch91

Social cognition on the Internet: testing constraints on social network size
journal, August 2012

  • Dunbar, R. I. M.
  • Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 367, Issue 1599
  • DOI: 10.1098/rstb.2012.0121

On power-law relationships of the Internet topology
journal, October 1999

  • Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
  • ACM SIGCOMM Computer Communication Review, Vol. 29, Issue 4
  • DOI: 10.1145/316194.316229

Graphs over time: densification laws, shrinking diameters and possible explanations
conference, January 2005

  • Leskovec, Jure; Kleinberg, Jon; Faloutsos, Christos
  • Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05
  • DOI: 10.1145/1081870.1081893

Power-Law Distributions in Empirical Data
journal, November 2009

  • Clauset, Aaron; Shalizi, Cosma Rohilla; Newman, M. E. J.
  • SIAM Review, Vol. 51, Issue 4
  • DOI: 10.1137/070710111

Cooperative Computing for Autonomous Data Centers
conference, May 2015

  • Berry, Jonathan; Collins, Michael; Kearns, Aaron
  • 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2015.109

BotOrNot: A System to Evaluate Social Bots
conference, January 2016

  • Davis, Clayton Allen; Varol, Onur; Ferrara, Emilio
  • Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion
  • DOI: 10.1145/2872518.2889302

Tolerating the community detection resolution limit with edge weighting
journal, May 2011

  • Berry, Jonathan W.; Hendrickson, Bruce; LaViolette, Randall A.
  • Physical Review E, Vol. 83, Issue 5
  • DOI: 10.1103/PhysRevE.83.056119

DeBot: Twitter Bot Detection via Warped Correlation
conference, December 2016

  • Chavoshi, Nikan; Hamooni, Hossein; Mueen, Abdullah
  • 2016 IEEE 16th International Conference on Data Mining (ICDM)
  • DOI: 10.1109/ICDM.2016.0096

Power-Law Distribution of the World Wide Web
journal, March 2000


On Information and Sufficiency
journal, March 1951

  • Kullback, S.; Leibler, R. A.
  • The Annals of Mathematical Statistics, Vol. 22, Issue 1
  • DOI: 10.1214/aoms/1177729694