Studying language evolution in the age of big data
Journal Article
·
· Journal of Language Evolution
- Santa Fe Inst. (SFI), Santa Fe, NM (United States); Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Univ. of Leipzig (Germany); Max Planck Inst. for Mathematics in the Sciences, Leipzig (Germany)
- Univ. of Zurich (Switzerland); Max Planck Inst. for the Science of Human History, Jena (Germany)
- Univ. of New Mexico, Albuquerque, NM (United States)
- Phillips Univ. Marburg (Germany)
- Arizona State Univ., Tempe, AZ (United States)
- Univ. of Leipzig (Germany)
- Santa Fe Inst. (SFI), Santa Fe, NM (United States); Georgia Inst. of Technology, Atlanta, GA (United States); Tokyo Inst. of Technology (Japan)
- Santa Fe Inst. (SFI), Santa Fe, NM (United States); Tokyo Inst. of Technology (Japan); Univ. of Leipzig (Germany)
- Santa Fe Inst. (SFI), Santa Fe, NM (United States); Russian State Univ. for the Humanities, Moscow (Russian Federation); Higher School of Economics, Moscow (Russian Federation)
- Northwester Inst. on Complex Systems, Evanston, IL (United States); Northwestern Univ., Evanston, IL (United States). Dept. of Chemistry
The increasing availability of large digital corpora of cross-linguistic data is revolutionizing many branches of linguistics. Overall, it has triggered a shift of attention from detailed questions about individual features to more global patterns amenable to rigorous, but statistical, analyses. This engenders an approach based on successive approximations where models with simplified assumptions result in frameworks that can then be systematically refined, always keeping explicit the methodological commitments and the assumed prior knowledge. Therefore, they can resolve disputes between competing frameworks quantitatively by separating the support provided by the data from the underlying assumptions. These methods, though, often appear as a ‘black box’ to traditional practitioners. In fact, the switch to a statistical view complicates comparison of the results from these newer methods with traditional understanding, sometimes leading to misinterpretation and overly broad claims. We describe here this evolving methodological shift, attributed to the advent of big, but often incomplete and poorly curated, data, emphasizing the underlying similarity of the newer quantitative to the traditional comparative methods and discussing when and to what extent the former have advantages over the latter. In this review, we cover briefly both randomization tests for detecting patterns in a largely model-independent fashion and phylolinguistic methods for a more model-based analysis of these patterns. We foresee a fruitful division of labor between the ability to computationally process large volumes of data and the trained linguistic insight declaring worthy prior commitments and interesting hypotheses in need of comparison.
- Research Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- Grant/Contract Number:
- 89233218CNA000001
- OSTI ID:
- 1484654
- Report Number(s):
- LA-UR--18-24872
- Journal Information:
- Journal of Language Evolution, Journal Name: Journal of Language Evolution Journal Issue: 2 Vol. 3; ISSN 2058-4571
- Publisher:
- Oxford University PressCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Phylogenetics beyond biology
|
journal | June 2018 |
Similar Records
Parallel Hybrid Metaheuristics with Distributed Intensification and Diversification for Large-scale Optimization in Big Data Statistical Analysis
Upper Subcritical Calculations Based on Correlated Data
Upper subcritical calculations based on correlated data - 14427
Conference
·
Sat Nov 30 23:00:00 EST 2019
·
OSTI ID:1606948
Upper Subcritical Calculations Based on Correlated Data
Conference
·
Wed Dec 31 23:00:00 EST 2014
·
OSTI ID:1215586
Upper subcritical calculations based on correlated data - 14427
Conference
·
Tue Sep 15 00:00:00 EDT 2015
·
OSTI ID:23100882