Do Programmers Prefer Predictable Expressions in Code?
- Univ. of California, Davis, CA (United States)
Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this phenomenon, relying on language model surprisal as a guiding mechanism. Additionally, while surprisal has been validated as a measure of cognitive load in natural language, its relation to human cognitive processes in code is still poorly understood. In this paper, we explore the relationship between surprisal and programmer preference at a small granularity—do programmers prefer more predictable expressions in code? Using meaning-preserving transformations, we produce equivalent alternatives to developer-written code expressions and run a corpus study on Java and Python projects. In general, language models rate the code expressions developers choose to write as more predictable than these transformed alternatives. Then, we perform two human subject studies asking participants to choose between two equivalent snippets of Java code with different surprisal scores (one original and transformed). We find that programmers do prefer more predictable variants, and that stronger language models like the transformer align more often and more consistently with these preferences.
- Research Organization:
- Sandia National Laboratories (SNL-CA), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
- Grant/Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1760357
- Report Number(s):
- SAND--2020-11603J; 692040
- Journal Information:
- Cognitive Science, Journal Name: Cognitive Science Journal Issue: 12 Vol. 44; ISSN 0364-0213
- Publisher:
- Wiley, Cognitive Science SocietyCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Effect of Ambiguity and Lexical Availability on Syntactic and Lexical Production
|
journal | June 2000 |
Syntactic/semantic interactions in programmer behavior: A model and experimental results
|
journal | June 1979 |
How programmers read regular code: a controlled experiment using eye tracking
|
journal | November 2016 |
Categorizing the Content of GitHub README Files
|
journal | October 2018 |
Studying the difference between natural and programming language corpora
|
journal | January 2019 |
The role of the insula in intuitive expert bug detection in computer code: an fMRI study
|
journal | May 2018 |
Perception in chess
|
journal | January 1973 |
Stimulus structures and mental representations in expert comprehension of computer programs
|
journal | July 1987 |
An effect of the accessibility of word forms on sentence structures
|
journal | April 1987 |
Towards a theory of the comprehension of computer programs
|
journal | June 1983 |
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context
|
journal | January 2003 |
Constructions: a new theoretical approach to language
|
journal | May 2003 |
Expectation-based syntactic comprehension
|
journal | March 2008 |
Data from eye-tracking corpora as evidence for theories of syntactic processing complexity
|
journal | November 2008 |
The effect of word predictability on reading time is logarithmic
|
journal | September 2013 |
Abstract knowledge versus direct experience in processing of binomial expressions
|
journal | December 2016 |
Random effects structure for confirmatory hypothesis testing: Keep it maximal
|
journal | April 2013 |
The Language of Programming: A Cognitive Perspective
|
journal | July 2019 |
When regularization gets it wrong: children over-simplify language input only in production
|
journal | February 2018 |
Measuring nominal scale agreement among many raters.
|
journal | January 1971 |
Improving transparency and replication in Bayesian statistics: The WAMBS-Checklist.
|
journal | June 2017 |
Brain potentials during reading reflect word expectancy and semantic association
|
journal | January 1984 |
Judy – a mutation testing tool for Java
|
journal | January 2010 |
Eye movement evidence that readers maintain and act on uncertainty about past linguistic input
|
journal | November 2009 |
Regularizing Unpredictable Variation: The Roles of Adult and Child Learners in Language Formation and Change
|
journal | April 2005 |
Literate Programming
|
journal | February 1984 |
Automatically assessing code understandability: How far are we?
|
conference | October 2017 |
Eye Movements in Code Reading: Relaxing the Linear Order
|
conference | May 2015 |
Improving code readability models with textual features
|
conference | May 2016 |
Replication Can Improve Prior Results: A GitHub Study of Pull Request Acceptance
|
conference | May 2019 |
On the naturalness of software
|
conference | June 2012 |
Expectations, outcomes, and challenges of modern code review
|
conference | May 2013 |
Decoding the Representation of Code in the Brain: An fMRI Study of Code Review and Expertise
|
conference | May 2017 |
Stochastic Optimization of Program Obfuscation
|
conference | May 2017 |
WAP: Understanding the Brain at Software Debugging
|
conference | October 2016 |
Will They Like This? Evaluating Code Contributions with Language Models
|
conference | May 2015 |
Automatically Generating Documentation for Lambda Expressions in Java
|
conference | May 2019 |
Program Comprehension: Past, Present, and Future
|
conference | March 2016 |
A Complexity Measure
|
journal | December 1976 |
Learning a Metric for Code Readability
|
journal | July 2010 |
From program comprehension to tool requirements for an industrial environment
|
conference | January 1993 |
The ‘Good Enough’ Approach to Language Comprehension: The ‘Good Enough’ Approach
|
journal | March 2007 |
A metric for software readability
|
conference | January 2008 |
Software complexity and maintenance costs
|
journal | November 1993 |
Malware detection based on mining API calls
|
conference | January 2010 |
A simpler model of software readability
|
conference | January 2011 |
Understanding understanding source code with functional magnetic resonance imaging
|
conference | May 2014 |
On the effect of code regularity on comprehension
|
conference | January 2014 |
Mining energy-greedy API usage patterns in Android apps: an empirical study
|
conference | January 2014 |
The major mutation framework: efficient and scalable mutation analysis for Java
|
conference | January 2014 |
On the localness of software
|
conference | January 2014 |
Mining idioms from source code
|
conference | January 2014 |
Manufacturing cheap, resilient, and stealthy opaque constructs
|
conference | January 1998 |
Understanding misunderstandings in source code
|
conference | August 2017 |
Measuring neural efficiency of program comprehension
|
conference | August 2017 |
Recovering clear, natural identifiers from obfuscated JS names
|
conference | August 2017 |
Are deep neural networks the best choice for modeling source code?
|
conference | August 2017 |
A Survey on the Usage of Eye-Tracking in Computer Programming
|
journal | April 2018 |
Prevalence of confusing code in software projects: atoms of confusion in the wild
|
conference | May 2018 |
"Automatically assessing code understandability" reanalyzed: combined metrics matter
|
conference | May 2018 |
A Survey of Machine Learning for Big Code and Naturalness
|
journal | September 2018 |
RefiNym: using names to refine types
|
conference | October 2018 |
Are mutants really natural?: a study on how "naturalness" helps mutant selection
|
conference | October 2018 |
The adverse effects of code duplication in machine learning models of code
|
conference | October 2019 |
Big code != big vocabulary: open-vocabulary models for source code
|
conference | October 2020 |
A Coefficient of Agreement for Nominal Scales
|
journal | April 1960 |
| Statistical Rethinking: A Bayesian Course with Examples in R and Stan | book | January 2016 |
Research Report—The Relevance of Application Domain Knowledge: The Case of Computer Program Comprehension
|
journal | September 1995 |
The Chicken or the Egg? A Probabilistic Analysis of English Binomials
|
journal | January 2006 |
Spoken syntax: The phonetics of giving a hand in New Zealand English
|
journal | January 2006 |
brms : An R Package for Bayesian Multilevel Models Using Stan
|
journal | January 2017 |
Neural Machine Translation of Rare Words with Subword Units
|
conference | January 2016 |
A noisy-channel model of rational human sentence comprehension under uncertain input
|
conference | January 2008 |
Similar Records
Predicting Foreign Language Usage from English-Only Social Media Posts
Language as a cognitive process: Volume 1: Syntax