## This content will become publicly available on July 24, 2020

# Making social networks more human: A topological approach

## Abstract

A key problem in social network analysis is to identify nonhuman interactions. State-of-the-art bot-detection systems like Botometer train machine-learning models on user-specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non-human activity from publicly available electronic-social-network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar-inspired bound; and then remove the violator-to-violator edges. We run our algorithm on multiple social networks and show that our Dunbar-inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter-2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5more »

- Authors:

- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Univ. of New Mexico, Albuquerque, NM (United States)

- Publication Date:

- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)

- OSTI Identifier:
- 1559509

- Report Number(s):
- SAND2019-9300J

Journal ID: ISSN 1932-1864; 678306

- Grant/Contract Number:
- AC04-94AL85000

- Resource Type:
- Accepted Manuscript

- Journal Name:
- Statistical Analysis and Data Mining

- Additional Journal Information:
- Journal Name: Statistical Analysis and Data Mining; Journal ID: ISSN 1932-1864

- Publisher:
- Wiley

- Country of Publication:
- United States

- Language:
- English

- Subject:
- 97 MATHEMATICS AND COMPUTING

### Citation Formats

```
Berry, Jonathan W., Phillips, Cynthia A., and Saia, Jared. Making social networks more human: A topological approach. United States: N. p., 2019.
Web. doi:10.1002/sam.11420.
```

```
Berry, Jonathan W., Phillips, Cynthia A., & Saia, Jared. Making social networks more human: A topological approach. United States. doi:10.1002/sam.11420.
```

```
Berry, Jonathan W., Phillips, Cynthia A., and Saia, Jared. Wed .
"Making social networks more human: A topological approach". United States. doi:10.1002/sam.11420.
```

```
@article{osti_1559509,
```

title = {Making social networks more human: A topological approach},

author = {Berry, Jonathan W. and Phillips, Cynthia A. and Saia, Jared},

abstractNote = {A key problem in social network analysis is to identify nonhuman interactions. State-of-the-art bot-detection systems like Botometer train machine-learning models on user-specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non-human activity from publicly available electronic-social-network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar-inspired bound; and then remove the violator-to-violator edges. We run our algorithm on multiple social networks and show that our Dunbar-inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter-2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5 or greater. Furthermore, after we remove the roughly 15 million violator-violator edges from the 1.2-billion-edge Twitter-2010 follower graph, 34% of the violator nodes experience a factor-of-two decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.},

doi = {10.1002/sam.11420},

journal = {Statistical Analysis and Data Mining},

number = ,

volume = ,

place = {United States},

year = {2019},

month = {7}

}

Works referenced in this record:

##
The rise of social bots

journal, June 2016

- Ferrara, Emilio; Varol, Onur; Davis, Clayton
- Communications of the ACM, Vol. 59, Issue 7

##
Empirical Analysis of an Evolving Social Network

journal, January 2006

- Kossinets, G.
- Science, Vol. 311, Issue 5757

##
A Scalable Generative Graph Model with Community Structure

journal, January 2014

- Kolda, Tamara G.; Pinar, Ali; Plantenga, Todd
- SIAM Journal on Scientific Computing, Vol. 36, Issue 5

##
Natural Sorting

journal, June 1962

- Baer, Robert M.; Brock, Paul
- Journal of the Society for Industrial and Applied Mathematics, Vol. 10, Issue 2

##
Why Do Simple Algorithms for Triangle Enumeration Work in the Real World?

journal, February 2015

- Berry, Jonathan W.; Fostvedt, Luke A.; Nordman, Daniel J.
- Internet Mathematics, Vol. 11, Issue 6

##
Algorithmic Complexity of Power Law Networks

conference, January 2016

- Brach, Paweł; Cygan, Marek; Łącki, Jakub
- Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms

##
Social cognition on the Internet: testing constraints on social network size

journal, August 2012

- Dunbar, R. I. M.
- Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 367, Issue 1599

##
On power-law relationships of the Internet topology

journal, October 1999

- Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
- ACM SIGCOMM Computer Communication Review, Vol. 29, Issue 4

##
Graphs over time: densification laws, shrinking diameters and possible explanations

conference, January 2005

- Leskovec, Jure; Kleinberg, Jon; Faloutsos, Christos
- Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05

##
Power-Law Distributions in Empirical Data

journal, November 2009

- Clauset, Aaron; Shalizi, Cosma Rohilla; Newman, M. E. J.
- SIAM Review, Vol. 51, Issue 4

##
Cooperative Computing for Autonomous Data Centers

conference, May 2015

- Berry, Jonathan; Collins, Michael; Kearns, Aaron
- 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

##
BotOrNot: A System to Evaluate Social Bots

conference, January 2016

- Davis, Clayton Allen; Varol, Onur; Ferrara, Emilio
- Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion

##
Tolerating the community detection resolution limit with edge weighting

journal, May 2011

- Berry, Jonathan W.; Hendrickson, Bruce; LaViolette, Randall A.
- Physical Review E, Vol. 83, Issue 5

##
DeBot: Twitter Bot Detection via Warped Correlation

conference, December 2016

- Chavoshi, Nikan; Hamooni, Hossein; Mueen, Abdullah
- 2016 IEEE 16th International Conference on Data Mining (ICDM)

##
Power-Law Distribution of the World Wide Web

journal, March 2000

- Adamic, Lada A.; Huberman, Bernardo A.; Barabási, A. -L.
- Science, Vol. 287, Issue 5461

##
On Information and Sufficiency

journal, March 1951

- Kullback, S.; Leibler, R. A.
- The Annals of Mathematical Statistics, Vol. 22, Issue 1