Do Programmers Prefer Predictable Expressions in Code?

Casalnuovo, Casey; Lee, Kevin; Wang, Hulin; Devanbu, Prem; Morgan, Emily

doi:10.1111/cogs.12921

Do Programmers Prefer Predictable Expressions in Code?

Journal Article · Fri Dec 11 23:00:00 EST 2020 · Cognitive Science

DOI:https://doi.org/10.1111/cogs.12921· OSTI ID:1760357

Casalnuovo, Casey ^[1]; Lee, Kevin ^[1]; Wang, Hulin ^[1]; Devanbu, Prem ^[1]; Morgan, Emily ^[1]

Univ. of California, Davis, CA (United States)

Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this phenomenon, relying on language model surprisal as a guiding mechanism. Additionally, while surprisal has been validated as a measure of cognitive load in natural language, its relation to human cognitive processes in code is still poorly understood. In this paper, we explore the relationship between surprisal and programmer preference at a small granularity—do programmers prefer more predictable expressions in code? Using meaning-preserving transformations, we produce equivalent alternatives to developer-written code expressions and run a corpus study on Java and Python projects. In general, language models rate the code expressions developers choose to write as more predictable than these transformed alternatives. Then, we perform two human subject studies asking participants to choose between two equivalent snippets of Java code with different surprisal scores (one original and transformed). We find that programmers do prefer more predictable variants, and that stronger language models like the transformer align more often and more consistently with these preferences.

View Accepted Manuscript (DOE)

Research Organization:: Sandia National Laboratories (SNL-CA), Livermore, CA (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)

Grant/Contract Number:: AC04-94AL85000

OSTI ID:: 1760357

Report Number(s):: SAND--2020-11603J; 692040

Journal Information:: Cognitive Science, Journal Name: Cognitive Science Journal Issue: 12 Vol. 44; ISSN 0364-0213

Publisher:: Wiley, Cognitive Science SocietyCopyright Statement

Country of Publication:: United States

Language:: English

References (73)

Effect of Ambiguity and Lexical Availability on Syntactic and Lexical Production Ferreira, Victor S.; Dell, Gary S. Cognitive Psychology, Vol. 40, Issue 4 https://doi.org/10.1006/cogp.1999.0730	journal	June 2000
Syntactic/semantic interactions in programmer behavior: A model and experimental results Shneiderman, Ben; Mayer, Richard International Journal of Computer & Information Sciences, Vol. 8, Issue 3 https://doi.org/10.1007/BF00977789	journal	June 1979
How programmers read regular code: a controlled experiment using eye tracking Jbara, Ahmad; Feitelson, Dror G. Empirical Software Engineering, Vol. 22, Issue 3 https://doi.org/10.1007/s10664-016-9477-x	journal	November 2016
Categorizing the Content of GitHub README Files Prana, Gede Artha Azriadi; Treude, Christoph; Thung, Ferdian Empirical Software Engineering, Vol. 24, Issue 3 https://doi.org/10.1007/s10664-018-9660-3	journal	October 2018
Studying the difference between natural and programming language corpora Casalnuovo, Casey; Sagae, Kenji; Devanbu, Prem Empirical Software Engineering, Vol. 24, Issue 4 https://doi.org/10.1007/s10664-018-9669-7	journal	January 2019
The role of the insula in intuitive expert bug detection in computer code: an fMRI study Castelhano, Joao; Duarte, Isabel C.; Ferreira, Carlos Brain Imaging and Behavior, Vol. 13, Issue 3 https://doi.org/10.1007/s11682-018-9885-1	journal	May 2018
Perception in chess Chase, William G.; Simon, Herbert A. Cognitive Psychology, Vol. 4, Issue 1 https://doi.org/10.1016/0010-0285(73)90004-2	journal	January 1973
Stimulus structures and mental representations in expert comprehension of computer programs Pennington, Nancy Cognitive Psychology, Vol. 19, Issue 3 https://doi.org/10.1016/0010-0285(87)90007-7	journal	July 1987
An effect of the accessibility of word forms on sentence structures Bock, Kathryn Journal of Memory and Language, Vol. 26, Issue 2 https://doi.org/10.1016/0749-596X(87)90120-3	journal	April 1987
Towards a theory of the comprehension of computer programs Brooks, Ruven International Journal of Man-Machine Studies, Vol. 18, Issue 6 https://doi.org/10.1016/S0020-7373(83)80031-5	journal	June 1983
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker, Jesse; Trueswell, John Journal of Memory and Language, Vol. 48, Issue 1 https://doi.org/10.1016/S0749-596X(02)00519-3	journal	January 2003
Constructions: a new theoretical approach to language Goldberg, Adele E. Trends in Cognitive Sciences, Vol. 7, Issue 5 https://doi.org/10.1016/S1364-6613(03)00080-9	journal	May 2003
Expectation-based syntactic comprehension Levy, Roger Cognition, Vol. 106, Issue 3 https://doi.org/10.1016/j.cognition.2007.05.006	journal	March 2008
Data from eye-tracking corpora as evidence for theories of syntactic processing complexity Demberg, Vera; Keller, Frank Cognition, Vol. 109, Issue 2 https://doi.org/10.1016/j.cognition.2008.07.008	journal	November 2008
The effect of word predictability on reading time is logarithmic Smith, Nathaniel J.; Levy, Roger Cognition, Vol. 128, Issue 3 https://doi.org/10.1016/j.cognition.2013.02.013	journal	September 2013
Abstract knowledge versus direct experience in processing of binomial expressions Morgan, Emily; Levy, Roger Cognition, Vol. 157 https://doi.org/10.1016/j.cognition.2016.09.011	journal	December 2016
Random effects structure for confirmatory hypothesis testing: Keep it maximal Barr, Dale J.; Levy, Roger; Scheepers, Christoph Journal of Memory and Language, Vol. 68, Issue 3 https://doi.org/10.1016/j.jml.2012.11.001	journal	April 2013
The Language of Programming: A Cognitive Perspective Fedorenko, Evelina; Ivanova, Anna; Dhamala, Riva Trends in Cognitive Sciences, Vol. 23, Issue 7 https://doi.org/10.1016/j.tics.2019.04.010	journal	July 2019
When regularization gets it wrong: children over-simplify language input only in production Schwab, Jessica F.; Lew-Williams, Casey; Goldberg, Adele E. Journal of Child Language, Vol. 45, Issue 5 https://doi.org/10.1017/S0305000918000041	journal	February 2018
Measuring nominal scale agreement among many raters. Fleiss, Joseph L. Psychological Bulletin, Vol. 76, Issue 5 https://doi.org/10.1037/h0031619	journal	January 1971
Improving transparency and replication in Bayesian statistics: The WAMBS-Checklist. Depaoli, Sarah; van de Schoot, Rens Psychological Methods, Vol. 22, Issue 2 https://doi.org/10.1037/met0000065	journal	June 2017
Brain potentials during reading reflect word expectancy and semantic association Kutas, Marta; Hillyard, Steven A. Nature, Vol. 307, Issue 5947 https://doi.org/10.1038/307161a0	journal	January 1984
Judy – a mutation testing tool for Java Madeyski, L.; Radyk, N. IET Software, Vol. 4, Issue 1 https://doi.org/10.1049/iet-sen.2008.0038	journal	January 2010
Eye movement evidence that readers maintain and act on uncertainty about past linguistic input Levy, Roger; Bicknell, Klinton; Slattery, Tim Proceedings of the National Academy of Sciences, Vol. 106, Issue 50 https://doi.org/10.1073/pnas.0907664106	journal	November 2009
Regularizing Unpredictable Variation: The Roles of Adult and Child Learners in Language Formation and Change Hudson Kam, Carla L.; Newport, Elissa L. Language Learning and Development, Vol. 1, Issue 2 https://doi.org/10.1080/15475441.2005.9684215	journal	April 2005
Literate Programming Knuth, D. E. The Computer Journal, Vol. 27, Issue 2 https://doi.org/10.1093/comjnl/27.2.97	journal	February 1984
Automatically assessing code understandability: How far are we? Scalabrino, Simone; Bavota, Gabriele; Vendome, Christopher 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) https://doi.org/10.1109/ASE.2017.8115654	conference	October 2017
Eye Movements in Code Reading: Relaxing the Linear Order Busjahn, Teresa; Bednarik, Roman; Begel, Andrew 2015 IEEE 23rd International Conference on Program Comprehension (ICPC) https://doi.org/10.1109/ICPC.2015.36	conference	May 2015
Improving code readability models with textual features Scalabrino, Simone; Linares-Vasquez, Mario; Poshyvanyk, Denys 2016 IEEE 24th International Conference on Program Comprehension (ICPC) https://doi.org/10.1109/ICPC.2016.7503707	conference	May 2016
Replication Can Improve Prior Results: A GitHub Study of Pull Request Acceptance Chen, Di; Stolee, Kathyrn T.; Menzies, Tim 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) https://doi.org/10.1109/ICPC.2019.00037	conference	May 2019
On the naturalness of software Hindle, Abram; Barr, Earl T.; Su, Zhendong 2012 34th International Conference on Software Engineering (ICSE 2012), 2012 34th International Conference on Software Engineering (ICSE) https://doi.org/10.1109/ICSE.2012.6227135	conference	June 2012
Expectations, outcomes, and challenges of modern code review Bacchelli, Alberto; Bird, Christian 2013 35th International Conference on Software Engineering (ICSE) https://doi.org/10.1109/ICSE.2013.6606617	conference	May 2013
Decoding the Representation of Code in the Brain: An fMRI Study of Code Review and Expertise Floyd, Benjamin; Santander, Tyler; Weimer, Westley 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) https://doi.org/10.1109/ICSE.2017.24	conference	May 2017
Stochastic Optimization of Program Obfuscation Liu, Han; Sun, Chengnian; Su, Zhendong 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) https://doi.org/10.1109/ICSE.2017.28	conference	May 2017
WAP: Understanding the Brain at Software Debugging Duraes, J.; Madeira, H.; Castelhano, J. 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE) https://doi.org/10.1109/ISSRE.2016.53	conference	October 2016
Will They Like This? Evaluating Code Contributions with Language Models Hellendoorn, Vincent J.; Devanbu, Premkumar T.; Bacchelli, Alberto 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR) https://doi.org/10.1109/MSR.2015.22	conference	May 2015
Automatically Generating Documentation for Lambda Expressions in Java Alqaimi, Anwar; Thongtanunam, Patanamon; Treude, Christoph 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) https://doi.org/10.1109/MSR.2019.00057	conference	May 2019
Program Comprehension: Past, Present, and Future Siegmund, Janet 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) https://doi.org/10.1109/SANER.2016.35	conference	March 2016
A Complexity Measure McCabe, T. J. IEEE Transactions on Software Engineering, Vol. SE-2, Issue 4 https://doi.org/10.1109/TSE.1976.233837	journal	December 1976
Learning a Metric for Code Readability Buse, Raymond P. L.; Weimer, Westley R. IEEE Transactions on Software Engineering, Vol. 36, Issue 4 https://doi.org/10.1109/TSE.2009.70	journal	July 2010
From program comprehension to tool requirements for an industrial environment von Mayrhauser, A.; Vans, A. M. [1993] IEEE Second Workshop on Program Comprehension https://doi.org/10.1109/WPC.1993.263903	conference	January 1993
The ‘Good Enough’ Approach to Language Comprehension: The ‘Good Enough’ Approach Ferreira, Fernanda; Patson, Nikole D. Language and Linguistics Compass, Vol. 1, Issue 1-2 https://doi.org/10.1111/j.1749-818X.2007.00007.x	journal	March 2007
A metric for software readability Buse, Raymond P. L.; Weimer, Westley R. Proceedings of the 2008 international symposium on Software testing and analysis - ISSTA '08 https://doi.org/10.1145/1390630.1390647	conference	January 2008
Software complexity and maintenance costs Banker, Rajiv D.; Datar, Srikant M.; Kemerer, Chris F. Communications of the ACM, Vol. 36, Issue 11 https://doi.org/10.1145/163359.163375	journal	November 1993
Malware detection based on mining API calls Sami, Ashkan; Yadegari, Babak; Peiravian, Naser Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10 https://doi.org/10.1145/1774088.1774303	conference	January 2010
A simpler model of software readability Posnett, Daryl; Hindle, Abram; Devanbu, Premkumar Proceeding of the 8th working conference on Mining software repositories - MSR '11 https://doi.org/10.1145/1985441.1985454	conference	January 2011
Understanding understanding source code with functional magnetic resonance imaging Siegmund, Janet; Kästner, Christian; Apel, Sven ICSE '14: 36th International Conference on Software Engineering, Proceedings of the 36th International Conference on Software Engineering https://doi.org/10.1145/2568225.2568252	conference	May 2014
On the effect of code regularity on comprehension Jbara, Ahmad; Feitelson, Dror G. Proceedings of the 22nd International Conference on Program Comprehension - ICPC 2014 https://doi.org/10.1145/2597008.2597140	conference	January 2014
Mining energy-greedy API usage patterns in Android apps: an empirical study Linares-Vásquez, Mario; Bavota, Gabriele; Bernal-Cárdenas, Carlos Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014 https://doi.org/10.1145/2597073.2597085	conference	January 2014
The major mutation framework: efficient and scalable mutation analysis for Java Just, René Proceedings of the 2014 International Symposium on Software Testing and Analysis - ISSTA 2014 https://doi.org/10.1145/2610384.2628053	conference	January 2014
On the localness of software Tu, Zhaopeng; Su, Zhendong; Devanbu, Premkumar Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014 https://doi.org/10.1145/2635868.2635875	conference	January 2014
Mining idioms from source code Allamanis, Miltiadis; Sutton, Charles Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014 https://doi.org/10.1145/2635868.2635901	conference	January 2014
Manufacturing cheap, resilient, and stealthy opaque constructs Collberg, Christian; Thomborson, Clark; Low, Douglas Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '98 https://doi.org/10.1145/268946.268962	conference	January 1998
Understanding misunderstandings in source code Gopstein, Dan; Iannacone, Jake; Yan, Yu ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering https://doi.org/10.1145/3106237.3106264	conference	August 2017
Measuring neural efficiency of program comprehension Siegmund, Janet; Peitek, Norman; Parnin, Chris ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering https://doi.org/10.1145/3106237.3106268	conference	August 2017
Recovering clear, natural identifiers from obfuscated JS names Vasilescu, Bogdan; Casalnuovo, Casey; Devanbu, Premkumar ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering https://doi.org/10.1145/3106237.3106289	conference	August 2017
Are deep neural networks the best choice for modeling source code? Hellendoorn, Vincent J.; Devanbu, Premkumar ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering https://doi.org/10.1145/3106237.3106290	conference	August 2017
A Survey on the Usage of Eye-Tracking in Computer Programming Obaidellah, Unaizah; Al Haek, Mohammed; Cheng, Peter C. -H. ACM Computing Surveys, Vol. 51, Issue 1 https://doi.org/10.1145/3145904	journal	April 2018
Prevalence of confusing code in software projects: atoms of confusion in the wild Gopstein, Dan; Zhou, Hongwei Henry; Frankl, Phyllis ICSE '18: 40th International Conference on Software Engineering, Proceedings of the 15th International Conference on Mining Software Repositories https://doi.org/10.1145/3196398.3196432	conference	May 2018
"Automatically assessing code understandability" reanalyzed: combined metrics matter Trockman, Asher; Cates, Keenen; Mozina, Mark ICSE '18: 40th International Conference on Software Engineering, Proceedings of the 15th International Conference on Mining Software Repositories https://doi.org/10.1145/3196398.3196441	conference	May 2018
A Survey of Machine Learning for Big Code and Naturalness Allamanis, Miltiadis; Barr, Earl T.; Devanbu, Premkumar ACM Computing Surveys, Vol. 51, Issue 4 https://doi.org/10.1145/3212695	journal	September 2018
RefiNym: using names to refine types Dash, Santanu Kumar; Allamanis, Miltiadis; Barr, Earl T. ESEC/FSE '18: 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering https://doi.org/10.1145/3236024.3236042	conference	October 2018
Are mutants really natural?: a study on how "naturalness" helps mutant selection Jimenez, Matthieu; Checkam, Thiery Titcheu; Cordy, Maxime ESEM '18: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement https://doi.org/10.1145/3239235.3240500	conference	October 2018
The adverse effects of code duplication in machine learning models of code Allamanis, Miltiadis SPLASH '19: 2019 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software https://doi.org/10.1145/3359591.3359735	conference	October 2019
Big code != big vocabulary: open-vocabulary models for source code Karampatsis, Rafael-Michael; Babii, Hlib; Robbes, Romain ICSE '20: 42nd International Conference on Software Engineering, Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering https://doi.org/10.1145/3377811.3380342	conference	October 2020
A Coefficient of Agreement for Nominal Scales Cohen, Jacob Educational and Psychological Measurement, Vol. 20, Issue 1 https://doi.org/10.1177/001316446002000104	journal	April 1960
Statistical Rethinking: A Bayesian Course with Examples in R and Stan McElreath, Richard https://doi.org/10.1201/9781315372495	book	January 2016
Research Report—The Relevance of Application Domain Knowledge: The Case of Computer Program Comprehension Shaft, Teresa M.; Vessey, Iris Information Systems Research, Vol. 6, Issue 3 https://doi.org/10.1287/isre.6.3.286	journal	September 1995
The Chicken or the Egg? A Probabilistic Analysis of English Binomials Benor, Sarah; Levy, Roger Language, Vol. 82, Issue 2 https://doi.org/10.1353/lan.2006.0077	journal	January 2006
Spoken syntax: The phonetics of giving a hand in New Zealand English Hay, Jennifer; Bresnan, Joan The Linguistic Review, Vol. 23, Issue 3 https://doi.org/10.1515/TLR.2006.013	journal	January 2006
brms : An R Package for Bayesian Multilevel Models Using Stan Bürkner, Paul-Christian Journal of Statistical Software, Vol. 80, Issue 1 https://doi.org/10.18637/jss.v080.i01	journal	January 2017
Neural Machine Translation of Rare Words with Subword Units Sennrich, Rico; Haddow, Barry; Birch, Alexandra Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) https://doi.org/10.18653/v1/P16-1162	conference	January 2016
A noisy-channel model of rational human sentence comprehension under uncertain input Levy, Roger Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08 https://doi.org/10.3115/1613715.1613749	conference	January 2008

Similar Records

Jess, the Java expert system shell

Technical Report · Fri Oct 31 23:00:00 EST 1997 · OSTI ID:565603

Predicting Foreign Language Usage from English-Only Social Media Posts

Conference · Fri Jun 01 00:00:00 EDT 2018 · OSTI ID:1440628

Language as a cognitive process: Volume 1: Syntax

Book · Fri Dec 31 23:00:00 EST 1982 · OSTI ID:5673189

Related Subjects

97 MATHEMATICS AND COMPUTING
Surprisal
dual channel constraints
human preference
language models
meaning-preserving transformations
source code expressions

Do Programmers Prefer Predictable Expressions in Code?

Citation Formats

References (73)

Similar Records

Related Subjects