Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Do Programmers Prefer Predictable Expressions in Code?

Journal Article · · Cognitive Science
DOI:https://doi.org/10.1111/cogs.12921· OSTI ID:1760357

Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this phenomenon, relying on language model surprisal as a guiding mechanism. Additionally, while surprisal has been validated as a measure of cognitive load in natural language, its relation to human cognitive processes in code is still poorly understood. In this paper, we explore the relationship between surprisal and programmer preference at a small granularity—do programmers prefer more predictable expressions in code? Using meaning-preserving transformations, we produce equivalent alternatives to developer-written code expressions and run a corpus study on Java and Python projects. In general, language models rate the code expressions developers choose to write as more predictable than these transformed alternatives. Then, we perform two human subject studies asking participants to choose between two equivalent snippets of Java code with different surprisal scores (one original and transformed). We find that programmers do prefer more predictable variants, and that stronger language models like the transformer align more often and more consistently with these preferences.

Research Organization:
Sandia National Laboratories (SNL-CA), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1760357
Report Number(s):
SAND--2020-11603J; 692040
Journal Information:
Cognitive Science, Journal Name: Cognitive Science Journal Issue: 12 Vol. 44; ISSN 0364-0213
Publisher:
Wiley, Cognitive Science SocietyCopyright Statement
Country of Publication:
United States
Language:
English

References (73)

Effect of Ambiguity and Lexical Availability on Syntactic and Lexical Production journal June 2000
Syntactic/semantic interactions in programmer behavior: A model and experimental results journal June 1979
How programmers read regular code: a controlled experiment using eye tracking journal November 2016
Categorizing the Content of GitHub README Files journal October 2018
Studying the difference between natural and programming language corpora journal January 2019
The role of the insula in intuitive expert bug detection in computer code: an fMRI study journal May 2018
Perception in chess journal January 1973
Stimulus structures and mental representations in expert comprehension of computer programs journal July 1987
An effect of the accessibility of word forms on sentence structures journal April 1987
Towards a theory of the comprehension of computer programs journal June 1983
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context journal January 2003
Constructions: a new theoretical approach to language journal May 2003
Expectation-based syntactic comprehension journal March 2008
Data from eye-tracking corpora as evidence for theories of syntactic processing complexity journal November 2008
The effect of word predictability on reading time is logarithmic journal September 2013
Abstract knowledge versus direct experience in processing of binomial expressions journal December 2016
Random effects structure for confirmatory hypothesis testing: Keep it maximal journal April 2013
The Language of Programming: A Cognitive Perspective journal July 2019
When regularization gets it wrong: children over-simplify language input only in production journal February 2018
Measuring nominal scale agreement among many raters. journal January 1971
Improving transparency and replication in Bayesian statistics: The WAMBS-Checklist. journal June 2017
Brain potentials during reading reflect word expectancy and semantic association journal January 1984
Judy – a mutation testing tool for Java journal January 2010
Eye movement evidence that readers maintain and act on uncertainty about past linguistic input journal November 2009
Regularizing Unpredictable Variation: The Roles of Adult and Child Learners in Language Formation and Change journal April 2005
Literate Programming journal February 1984
Automatically assessing code understandability: How far are we? conference October 2017
Eye Movements in Code Reading: Relaxing the Linear Order conference May 2015
Improving code readability models with textual features conference May 2016
Replication Can Improve Prior Results: A GitHub Study of Pull Request Acceptance conference May 2019
On the naturalness of software
  • Hindle, Abram; Barr, Earl T.; Su, Zhendong
  • 2012 34th International Conference on Software Engineering (ICSE 2012), 2012 34th International Conference on Software Engineering (ICSE) https://doi.org/10.1109/ICSE.2012.6227135
conference June 2012
Expectations, outcomes, and challenges of modern code review conference May 2013
Decoding the Representation of Code in the Brain: An fMRI Study of Code Review and Expertise conference May 2017
Stochastic Optimization of Program Obfuscation conference May 2017
WAP: Understanding the Brain at Software Debugging conference October 2016
Will They Like This? Evaluating Code Contributions with Language Models
  • Hellendoorn, Vincent J.; Devanbu, Premkumar T.; Bacchelli, Alberto
  • 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR) https://doi.org/10.1109/MSR.2015.22
conference May 2015
Automatically Generating Documentation for Lambda Expressions in Java conference May 2019
Program Comprehension: Past, Present, and Future conference March 2016
A Complexity Measure journal December 1976
Learning a Metric for Code Readability journal July 2010
From program comprehension to tool requirements for an industrial environment conference January 1993
The ‘Good Enough’ Approach to Language Comprehension: The ‘Good Enough’ Approach journal March 2007
A metric for software readability conference January 2008
Software complexity and maintenance costs journal November 1993
Malware detection based on mining API calls conference January 2010
A simpler model of software readability conference January 2011
Understanding understanding source code with functional magnetic resonance imaging
  • Siegmund, Janet; Kästner, Christian; Apel, Sven
  • ICSE '14: 36th International Conference on Software Engineering, Proceedings of the 36th International Conference on Software Engineering https://doi.org/10.1145/2568225.2568252
conference May 2014
On the effect of code regularity on comprehension conference January 2014
Mining energy-greedy API usage patterns in Android apps: an empirical study
  • Linares-Vásquez, Mario; Bavota, Gabriele; Bernal-Cárdenas, Carlos
  • Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014 https://doi.org/10.1145/2597073.2597085
conference January 2014
The major mutation framework: efficient and scalable mutation analysis for Java conference January 2014
On the localness of software
  • Tu, Zhaopeng; Su, Zhendong; Devanbu, Premkumar
  • Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014 https://doi.org/10.1145/2635868.2635875
conference January 2014
Mining idioms from source code conference January 2014
Manufacturing cheap, resilient, and stealthy opaque constructs
  • Collberg, Christian; Thomborson, Clark; Low, Douglas
  • Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '98 https://doi.org/10.1145/268946.268962
conference January 1998
Understanding misunderstandings in source code
  • Gopstein, Dan; Iannacone, Jake; Yan, Yu
  • ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering https://doi.org/10.1145/3106237.3106264
conference August 2017
Measuring neural efficiency of program comprehension
  • Siegmund, Janet; Peitek, Norman; Parnin, Chris
  • ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering https://doi.org/10.1145/3106237.3106268
conference August 2017
Recovering clear, natural identifiers from obfuscated JS names
  • Vasilescu, Bogdan; Casalnuovo, Casey; Devanbu, Premkumar
  • ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering https://doi.org/10.1145/3106237.3106289
conference August 2017
Are deep neural networks the best choice for modeling source code?
  • Hellendoorn, Vincent J.; Devanbu, Premkumar
  • ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering https://doi.org/10.1145/3106237.3106290
conference August 2017
A Survey on the Usage of Eye-Tracking in Computer Programming journal April 2018
Prevalence of confusing code in software projects: atoms of confusion in the wild
  • Gopstein, Dan; Zhou, Hongwei Henry; Frankl, Phyllis
  • ICSE '18: 40th International Conference on Software Engineering, Proceedings of the 15th International Conference on Mining Software Repositories https://doi.org/10.1145/3196398.3196432
conference May 2018
"Automatically assessing code understandability" reanalyzed: combined metrics matter
  • Trockman, Asher; Cates, Keenen; Mozina, Mark
  • ICSE '18: 40th International Conference on Software Engineering, Proceedings of the 15th International Conference on Mining Software Repositories https://doi.org/10.1145/3196398.3196441
conference May 2018
A Survey of Machine Learning for Big Code and Naturalness journal September 2018
RefiNym: using names to refine types
  • Dash, Santanu Kumar; Allamanis, Miltiadis; Barr, Earl T.
  • ESEC/FSE '18: 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering https://doi.org/10.1145/3236024.3236042
conference October 2018
Are mutants really natural?: a study on how "naturalness" helps mutant selection
  • Jimenez, Matthieu; Checkam, Thiery Titcheu; Cordy, Maxime
  • ESEM '18: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement https://doi.org/10.1145/3239235.3240500
conference October 2018
The adverse effects of code duplication in machine learning models of code
  • Allamanis, Miltiadis
  • SPLASH '19: 2019 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software https://doi.org/10.1145/3359591.3359735
conference October 2019
Big code != big vocabulary: open-vocabulary models for source code
  • Karampatsis, Rafael-Michael; Babii, Hlib; Robbes, Romain
  • ICSE '20: 42nd International Conference on Software Engineering, Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering https://doi.org/10.1145/3377811.3380342
conference October 2020
A Coefficient of Agreement for Nominal Scales journal April 1960
Statistical Rethinking: A Bayesian Course with Examples in R and Stan book January 2016
Research Report—The Relevance of Application Domain Knowledge: The Case of Computer Program Comprehension journal September 1995
The Chicken or the Egg? A Probabilistic Analysis of English Binomials journal January 2006
Spoken syntax: The phonetics of giving a hand in New Zealand English journal January 2006
brms : An R Package for Bayesian Multilevel Models Using Stan journal January 2017
Neural Machine Translation of Rare Words with Subword Units
  • Sennrich, Rico; Haddow, Barry; Birch, Alexandra
  • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) https://doi.org/10.18653/v1/P16-1162
conference January 2016
A noisy-channel model of rational human sentence comprehension under uncertain input conference January 2008

Similar Records

Jess, the Java expert system shell
Technical Report · Fri Oct 31 23:00:00 EST 1997 · OSTI ID:565603

Predicting Foreign Language Usage from English-Only Social Media Posts
Conference · Fri Jun 01 00:00:00 EDT 2018 · OSTI ID:1440628

Language as a cognitive process: Volume 1: Syntax
Book · Fri Dec 31 23:00:00 EST 1982 · OSTI ID:5673189