References

Abeysooriya, Mandhri, Megan Soria, Mary Sravya Kasu & Mark Ziemann. 2021. Gene name errors: Lessons not learned. PLOS Computational Biology. Public 17(7). e1008984. https://doi.org/10.1371/journal.pcbi.1008984.

Acheson, Daniel J., Justine B. Wells & Maryellen C. MacDonald. 2008. New and updated tests of print exposure and reading abilities in college students. Behavior Research Methods 40(1). 278–289. https://doi.org/10.3758/brm.40.1.278.

Alhazmi, Fahd. 2020. A visual interpretation of the standard deviation. Medium. https://towardsdatascience.com/a-visual-interpretation-of-the-standard-deviation-30f4676c291c.

Almeida, Alexandre, Adam Loy & Heike Hofmann. 2018. ggplot2 compatible quantile-quantile plots in R. The R Journal 10(2). 248–261. https://doi.org/10.32614/RJ-2018-051.

alvinashcraft, alexbuckgit, ArcticLampyrid & bearmannl. 2022. Maximum path length limitation. Learn Microsoft. https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation.

Baayen, R. Harald. 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge University Press.

Barr, Dale & Lisa DeBruine. 2023. webexercises: Create interactive web exercises in “R Markdown” (formerly “webex”). https://doi.org/10.32614/CRAN.package.webexercises.

Barrett, Malcolm. 2018. Why should I use the here package when I’m already using projects? https://malco.io/articles/2018-11-05-why-should-i-use-the-here-package-when-i-m-already-using-projects.

Ben-Shachar, Mattan, Daniel Lüdecke & Dominique Makowski. 2020. Effectsize: Estimation of effect size indices and standardized parameters. Journal of Open Source Software 5(56). 2815. https://doi.org/10.21105/joss.02815.

Berez-Kroeker, Andrea L., Bradley McDonnell, Eve Koller & Lauren B. Collister. 2022. The Open Handbook of Linguistic Data Management. MIT Press. https://doi.org/10.7551/mitpress/12200.001.0001.

Bochynska, Agata, Liam Keeble, Caitlin Halfacre, Joseph V. Casillas, Irys-Amélie Champagne, Kaidi Chen, Melanie Röthlisberger, Erin M. Buchanan & Timo B. Roettger. 2023. Reproducible research practices and transparency across linguistics. Glossa Psycholinguistics 2(1). https://doi.org/10.5070/G6011239.

Breheny, Patrick & Woodrow Burchett. 2017. Visualization of Regression Models Using visreg. The R Journal 9(2). 56. https://doi.org/10.32614/RJ-2017-046.

Bryan, Jennifer. 2018. Let’s Git started: Happy Git and GitHub for the useR. Open Education Resource. https://happygitwithr.com/.

Bryan, Jenny. 2017. Project-oriented workflow. Tidyverse.org. https://www.tidyverse.org/blog/2017/12/workflow-vs-script/.

Busterud, Guro, Anne Dahl, Dave Kush & Kjersti Faldet Listhaug. 2023. Verb placement in L3 french and L3 german: The role of language-internal factors in determining cross-linguistic influence from prior languages. Linguistic Approaches to Bilingualism. John 13(5). 693–716. https://doi.org/10.1075/lab.22058.bus.

Center for OpenScience. 2025. Choosing the Right Preregistration Template: A Guide for Researchers. https://www.cos.io/blog/choosing-preregistration-template-guide-for-researchers.

Çetinkaya-Rundel, Mine & Johanna Hardin. 2021. Introduction to modern statistics. Second. Leanpub. https://openintro-ims.netlify.app/.

Cleveland, William S. & Robert McGill. 1987. Graphical perception: The visual decoding of quantitative information on graphical displays of data. Journal of the Royal Statistical Society: Series A (General) 150(3). 192–210. https://doi.org/10.2307/2981473.

Cohen, Jacob. 1988. Statistical power analysis for the behavioral sciences. 2. ed., reprint. New York, NY: Psychology Press.

Consortium, TEI. 2025. TEI P5: Guidelines for electronic text encoding and interchange. Zenodo. https://doi.org/10.5281/zenodo.17161156.

Dąbrowska, Ewa. 2019. Experience, aptitude, and individual differences in linguistic attainment: A comparison of native and nonnative speakers. Language Learning 69(S1). 72–100. https://doi.org/10.1111/lang.12323.

Dauber, Daniel. 2024. R for non-programmers: A guide for social scientists. Open Education Resource. https://bookdown.org/daniel_dauber_io/r4np_book/.

Douglas, Alex, Deon Roos, Francesca Mancini & David Lusseau. 2024. An introduction to R. https://intro2r.com/.

Ekman, Paul & Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior.

Few, Stephen. Save the pies for dessert. August 2007. http://www.perceptualedge.com/articles/08-21-07.pdf.

Field, Andy P., Jeremy Miles & Zoë Field. 2012. Discovering statistics using r. Sage.

Fox, John & Sanford Weisberg. 2019. An R companion to applied regression. Third edition. SAGE.

Fricke, Lea, Patrick G Grosz & Tatjana Scheffler. 2024. Semantic differences in visually similar face emojis. Language and Cognition. Cambridge University Press 1–15. https://doi.org/10.1017/langcog.2024.12.

Fugate, Jennifer MB & Courtny L Franco. 2021. Implications for emotion: Using anatomically based facial coding to compare emoji faces across platforms. Frontiers in Psychology. Frontiers Media SA 12. 605928. https://doi.org/10.3389/fpsyg.2021.605928.

Garnier, Simon, Noam Ross, BoB Rudis, Antoine Filipovic-Pierucci, Tal Galili, Timelyportfolio, Alan O’Callaghan, et al. 2023. Sjmgarnier/viridis: CRAN release v0.6.3. Zenodo. https://doi.org/10.5281/ZENODO.4679423.

Gelman, Andrew. 2018. Ethics in statistical practice and communication: Five recommendations. Significance 15(5). 40–43. https://doi.org/10.1111/j.1740-9713.2018.01193.x.

Gelman, Andrew. 2019. Embracing variation and accepting uncertainty: Implications for science and metascience. https://www.youtube.com/watch?v=VQCcMP4A5Ks.

Godfrey, A. Jonathan R., Debra Warren, Deepayan Sarkar, Gabriel Becker, James Thompson, Paul Murrell, Timothy Bilton & Volker Sorge. 2025. BrailleR: Improved access for blind users. https://github.com/ajrgodfrey/BrailleR.

Good, Jeff. 2022. The scope of linguistic data. In Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller & Lauren B. Collister (eds.), The open handbook of linguistic data management, 27–47. MIT Press. https://doi.org/10.7551/mitpress/12200.001.0001.

Gries, Stefan Th. & Nick C. Ellis. 2015. Statistical measures for usage-based linguistics. Language Learning 65(S1). 228–255. https://doi.org/10.1111/lang.12119.

Gries, Stefan Thomas. 2021. Statistics for linguistics with R: A practical introduction (De Gruyter Mouton Textbook). 3rd revised edition. de Gruyter Mouton.

Grömping, Ulrike. 2006. Relative Importance for Linear Regression inR: The Package relaimpo. Journal of Statistical Software 17(1). https://doi.org/10.18637/jss.v017.i01.

Grosz, Patrick Georg, Gabriel Greenberg, Christian De Leon & Elsi Kaiser. 2023. A semantics of face emoji in discourse. Linguistics and Philosophy. Springer 46(4). 905–957. https://doi.org/10.1007/s10988-022-09369-8.

Haroz, Steve. 2022. Comparison of preregistration platforms. https://doi.org/10.31222/osf.io/zry2u.

Harrell, Frank E. 2015. Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis (Springer Series in Statistics). Springer International Publishing. https://doi.org/10.1007/978-3-319-19425-7.

Horst, Allison & Julie Lowndes. 2020. Openscapes - Tidy data for efficiency, reproducibility, and collaboration. https://openscapes.org/blog/2020-10-12-tidy-data/.

Hvitfeldt, Emil. 2021. Paletteer: Comprehensive collection of color palettes. https://github.com/EmilHvitfeldt/paletteer.

Iannone, Richard, Joe Cheng, Barret Schloerke, Shannon Haughton, Ellis Hughes, Alexandra Lauer, Romain François, JooYoung Seo, Ken Brevoort & Olivier Roy. 2025. gt: Easily create presentation-ready display tables. https://doi.org/10.32614/CRAN.package.gt.

iris-database.org. 2011. IRIS. https://iris-database.org/.

Kaufman, Allison B. & James C. Kaufman (eds.). 2018. The illusion of causality: A cognitive bias underlying pseudoscience. In Pseudoscience. The MIT Press. https://doi.org/10.7551/mitpress/10747.003.0007.

Kung, Susan Smythe. 2022. Developing a data management plan. In Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller & Lauren B. Collister (eds.), The open handbook of linguistic data management, 101–115. MIT Press. https://doi.org/10.7551/mitpress/12200.001.0001.

Lakens, Daniël. 2022. Improving your statistical inferences. Zenodo. https://doi.org/10.5281/ZENODO.6409077.

Lausberg, Hedda & Han Sloetjes. 2009. Coding gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods 41(3). 841–849. https://doi.org/10.3758/BRM.41.3.841.

Le Foll, Elen. 2022. Textbook English: A corpus-based analysis of the language of EFL textbooks used in secondary schools in France, Germany and Spain. Osnabrück University PhD thesis. https://doi.org/10.48693/278.

Lenth, Russell V. 2025. Emmeans: Estimated marginal means, aka least-squares means. https://rvlenth.github.io/emmeans/.

Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. John Benjamins.

Levshina, Natalia. 2022. Comparing Bayesian and Frequentist Models of Language Variation: The Case of Help + (to-)Infinitive. In Ole Schützler & Julia Schlüter (eds.), 224–258. 1st edn. Cambridge University Press. https://doi.org/10.1017/9781108589314.009.

Lindeman, Richard Harold, Peter Francis Merenda & Ruth Z. Gold. 1980. Introduction to bivariate and multivariate analysis. Scott, Foresman.

Lüdecke, Daniel. 2020. sjPlot: Data visualization for statistics in social science. https://CRAN.R-project.org/package=sjPlot.

Lüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip Waggoner & Dominique Makowski. 2021. performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software 6(60). 3139. https://doi.org/10.21105/joss.03139.

Lüdecke, Daniel, Indrajeet Patil, Mattan S. Ben-Shachar, Brenton M. Wiernik, Philip Waggoner & Dominique Makowski. 2021. See: An R package for visualizing statistical models. Journal of Open Source Software 6(64). 3393. https://doi.org/10.21105/joss.03393.

Maier, Emar. 2023. Emojis as pictures. Ergo 10. https://doi.org/10.3998/ergo.4641.

Matejka, Justin & George Fitzmaurice. 2017. Same stats, different graphs: Generating datasets with varied appearance and identical statistics through simulated annealing. In, 12901294. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3025453.3025912.

Matute, Helena, Fernando Blanco, Ion Yarritu, Marcos Díaz-Lago, Miguel A. Vadillo & Itxaso Barberia. 2015. Illusions of causality: How they bias our everyday thinking and how they could be reduced. Frontiers in Psychology. Frontiers 6. https://doi.org/10.3389/fpsyg.2015.00888.

Mertzen, Daniela, Sol Lago & Shravan Vasishth. 2021. The benefits of preregistration for hypothesis-driven bilingualism research. Bilingualism: Language and Cognition 24(5). 807–812. https://doi.org/10.1017/S1366728921000031.

Mizumoto, Atsushi. 2023. Calculating the relative importance of multiple regression predictor variables using dominance analysis and random forests. Language Learning 73(1). 161–196. https://doi.org/10.1111/lang.12518.

Mizumoto, Atsushi & Luke Plonsky. 2016. R as a lingua franca: Advantages of using r for quantitative research in applied linguistics. Applied Linguistics 37(2). 284–291. https://doi.org/10.1093/applin/amv025.

Moroz, George. 2020. Create check-fields and check-boxes with checkdown. https://CRAN.R-project.org/package=checkdown.

Müller, Kirill. 2025. here: A simpler way to find your files. https://doi.org/10.32614/CRAN.package.here.

Neuwirth, Erich. 2022. Package “RColorBrewer.” ColorBrewer palettes 991. https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf.

Nicenboim, Bruno, Daniel Schad & Shravan Vasishth. 2026. Introduction to Bayesian Data Analysis for cognitive science (Chapman & Hall/CRC Statistics in the social and behavioral sciences series). Boca Raton London New York: CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9780429342646.

Nimon, Kim F. 2012. Statistical assumptions of substantive analyses across the general linear model: A mini-review. Frontiers in Psychology 3. https://doi.org/10.3389/fpsyg.2012.00322.

Ou, Jianhong. 2021. colorBlindness: Safe color set for color blindness. https://CRAN.R-project.org/package=colorBlindness.

Paquot, Magali, Alexander König, Egon W. Stemle & Jennifer-Carmen Frey. 2024. The core metadata schema for learner corpora (LC-meta): Collaborative efforts to advance data discoverability, metadata quality and study comparability in L2 research. International Journal of Learner Corpus Research 10(2). 280–300. https://doi.org/10.1075/ijlcr.24010.paq.

Parsons, Sam, Flávio Azevedo, Mahmoud M. Elsherif, Samuel Guay, Owen N. Shahim, Gisela H. Govaart, Emma Norris, et al. 2022. A community-sourced glossary of open scholarship terms. Nature Human Behaviour. Nature 6(3). 312–318. https://doi.org/10.1038/s41562-021-01269-4.

Pedersen, Thomas Lin. 2024. Patchwork: The Composer of Plots. https://patchwork.data-imaginist.com.

Pedersen, Thomas Lin & Maxim Shemanarev. 2024. Ragg: Graphic devices based on AGG. https://ragg.r-lib.org.

Pfadenhauer, Katrin & Evelyn Wiesinger (eds.). 2024. Romance motion verbs in language change: Grammar, lexicon, discourse. De Gruyter. https://doi.org/10.1515/9783111248141.

Pfeifer, Valeria A, Emma L Armstrong & Vicky Tzuyin Lai. 2022. Do all facial emojis communicate emotion? The impact of facial emojis on perceived sender emotion and text processing. Computers in Human Behavior. Elsevier 126. 107016. https://doi.org/10.1016/j.chb.2021.107016.

Plonsky, Luke & Frederick L. Oswald. 2014. How big is “big”? Interpreting effect sizes in L2 research. Language Learning 64(4). 878–912. https://doi.org/10.1111/lang.12079.

Prat, Chantel S., Tara M. Madhyastha, Malayka J. Mottarella & Chu-Hsuan Kuo. 2020. Relating natural language aptitude to individual differences in learning programming languages. Scientific Reports. Nature 10(1). 3817. https://doi.org/10.1038/s41598-020-60661-8.

R Core Team. 2024. R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/.

R Core Team. 2025. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Rodrigues, Bruno & Philipp Baumann. 2026. Rix: Reproducible data science environments with “nix”. https://docs.ropensci.org/rix/.

Rodriguez-Sanchez, Francisco & Connor P. Jackson. 2025. grateful: Facilitate citation of R packages. https://pakillo.github.io/grateful/.

Roettger, Timo B. 2021. Preregistration in experimental linguistics: Applications, challenges, and limitations. Linguistics. De 59(5). 1227–1249. https://doi.org/10.1515/ling-2019-0048.

Scheffler, Tatjana & Ivan Nenchev. 2024. Affective, semantic, frequency, and descriptive norms for 107 face emojis. Behavior Research Methods. Springer 1–22. https://doi.org/10.3758/s13428-024-02444-x.

Schimke, Sarah, Israel de la Fuente, Barbara Hemforth & Saveria Colonna. 2018. First language influence on second language offline and online ambiguous pronoun resolution. Language Learning 68(3). 744–779. https://doi.org/10.1111/lang.12293.

Schweinberger, Martin. 2022. Data management, version control, and reproducibility. https://ladal.edu.au/repro.html.

Seibold, Heidi & Rabea Müller. 2023. BERD course: Make your research reproducible. https://doi.org/10.17605/OSF.IO/RUPT7.

Sievert, Carson. 2020. Interactive web-based data visualization with r, plotly, and shiny. Chapman; Hall/CRC. https://plotly-r.com.

Silge, Julia. 2022. Janeaustenr: Jane Austen’s complete novels. https://CRAN.R-project.org/package=janeaustenr.

Smith, Gary. 2018. Step away from stepwise. Journal of Big Data. SpringerOpen 5(1). 1–12. https://doi.org/10.1186/s40537-018-0143-6.

Sonderegger, Morgan. 2023. Regression modeling for linguistic data. The MIT Press.

Sóskuthy, Márton. Generalised additive mixed models for dynamic analysis in linguistics: A practical introduction. https://doi.org/10.48550/arXiv.1703.05339.

South Carolina, University of. Alternative text. Digital Accessibility. https://sc.edu/about/offices_and_divisions/digital-accessibility/toolbox/best_practices/alternative_text/.

Stefanowitsch, Anatol & Susanne Flach. 2017. The corpus-based perspective on entrenchment. In Hans-Jörg Schmid (ed.), Entrenchment and the psychology of language learning: How we reorganize and adapt linguistic knowledge, 101–127. De Gruyter. https://doi.org/10.1037/15969-006.

Tabachnick, Barbara G. & Linda S. Fidell. 2014. Using multivariate statistics (Always Learning). Pearson new international edition, sixth edition. Pearson.

The Turing Way Community. 2022. The Turing Way: A handbook for reproducible, ethical and collaborative research (1.0.2). Zenodo. https://doi.org/10.5281/zenodo.3233853.

Thompson, Bruce. 1995. Stepwise regression and stepwise discriminant analysis need not apply here: A guidelines editorial. Educational and Psychological Measurement. SAGE 55(4). 525–534. https://doi.org/10.1177/0013164495055004001.

Trippel, Thorsten. 2025. Metadata for research data. In Piotr Bański, Ulrich Heid & Laura Herzberg (eds.), Harmonizing language data: Standards for linguistic resources, 251–279. De Gruyter. https://www.degruyterbrill.com/document/doi/10.1515/9783112208212-011/html.

Ushey, Kevin & Hadley Wickham. 2023. Renv: Project environments. https://CRAN.R-project.org/package=renv.

Van Hulle, Sven & Renata Enghels. 2024a. The category of throw verbs as productive source of the spanish inchoative construction. In Katrin Pfadenhauer & Evelyn Wiesinger (eds.), Romance motion verbs in language change, 213–240. De Gruyter. https://doi.org/10.1515/9783111248141-009.

Van Hulle, Sven & Renata Enghels. 2024b. TROLLing replication data for: “The category of throw verbs as productive source of the spanish inchoative construction. DataverseNO, V1.” https://doi.org/10.18710/TR2PWJ.

Vasishth, Shravan & Andrew Gelman. 2021. How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis. Linguistics 59(5). 1311–1342. https://doi.org/10.1515/ling-2019-0051.

Wickham, Hadley. 2016. ggplot2: Elegant graphics for data analysis. New York: Springer. https://ggplot2.tidyverse.org.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. Welcome to the tidyverse. Journal of Open Source Software 4(43). 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Mine Çetinkaya-Rundel & Garrett Grolemund. 2023. R for data science: Import, tidy, transform, visualize, and model data. 2nd edition. O’Reilly. https://r4ds.hadley.nz/.

Wickham, Hadley, Romain François & Lucy D’Agostino McGowan. 2024. Emo: Easily insert ’emoji’. https://github.com/hadley/emo.

Wickham, Hadley, Davis Vaughan & Maximilian Girlich. Tidy messy data. https://tidyr.tidyverse.org/.

Wilkinson, Leland. 2005. The Grammar of Graphics (Statistics and Computing). New York: Springer. https://doi.org/10.1007/0-387-28695-0.

Williams, Matt N., Carlos Alberto Gómez Grajales & Dason Kurkiewicz. 2013. Assumptions of multiple regression: Correcting two misconceptions. Practical Assessment, Research, and Evaluation 18(11).

Windhouwer, Menzo & Twan Goosen. 2022. Component metadata infrastructure. In Darja Fišer & Andreas Witt (eds.), CLARIN: The infrastructure for language resources, 191–222. De Gruyter. https://doi.org/10.1515/9783110767377-008.

Winter, Bodo. 2020. Statistics for linguists: An introduction using R. Routledge. https://doi.org/10.4324/9781315165547.

Withers, Peter. 2012. Metadata management with arbil. In V. D. Arranz, B. Broeder, M. Gaiffe, M. Gavrilidou & M. Monachini (eds.), Proceedings of the workshop describing LRs with metadata: Towards flexibility and interoperability in the documentation of LR, 72–75. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2012/workshops/11.LREC2012%20Metadata%20Proceedings.pdf#page=79.

Xie, Yihui. 2025. xfun: Supporting functions for packages maintained by “Yihui Xie”. https://doi.org/10.32614/CRAN.package.xfun.

Ye, Jiachu, Xiaoyan Lai & Gary Ka-Wai Wong. 2022. The transfer effects of computational thinking: A systematic review with meta-analysis and qualitative synthesis. Journal of Computer Assisted Learning 38(6). 1620–1638. https://doi.org/10.1111/jcal.12723.

Ziemann, Mark, Yotam Eren & Assam El-Osta. 2016. Gene name errors are widespread in the scientific literature. Genome Biology 17(1). 177. https://doi.org/10.1186/s13059-016-1044-7.