Next-step resources
In the hope that this textbook has inspired you to dive deeper into the wonderful world of quantitative linguistics, data analysis, statistics, data visualisation, and coding in R, here is a (work-in-progress) curated list of further resources to continue your learning journey! 🚀
As with the rest of this textbook (see Preface), this list is work in progress. Do drop me a line to let me know about any great resources that I have missed!
Recommended resources specific to the language sciences
Brezina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge University Press. https://doi.org/10.1017/9781316410899.
Desagulier, Guillaume. 2017. Corpus Linguistics and Statistics with R: Introduction to Quantitative Methods in Linguistics (Quantitative Methods in the Humanities and Social Sciences). Springer International Publishing. https://doi.org/10.1007/978-3-319-64572-8.
Gries, Stefan Th. 2021. Statistics for linguistics with R: a practical introduction (De Gruyter Mouton Textbook). 3rd revised edition. de Gruyter Mouton. https://doi.org/10.1515/9783110718256.
LADAL contributors. Detailed step-by-step tutorials on a wide range of topics from
Rbasics to text analytics, statistics, and data visualisation by the Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia. https://ladal.edu.au/tutorials.html.Open Access.Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. John Benjamins. https://doi.org/10.1075/z.195.
Francom, Jerid. 2025. An Introduction to Quantitative Text Analysis for Linguistics: Reproducible Research Using R. Routledge. https://doi.org/10.4324/9781003393764.
Open Access.Quené, Hugo & Huub van den Bergh. 2022. Quantitative Methods and Statistics. https://hugoquene.github.io/QMS-EN/. Also available in Dutch.
Open Access.Nicenboim, Bruno, Daniel Schad & Shravan Vasishth. 2026. Introduction to Bayesian Data Analysis for Cognitive Science (Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences Series). CRC Press. https://doi.org/10.1201/9780429342646.
Open Access version.Rühlemann, Christoph. 2020. Visual Linguistics with R: A practical introduction to quantitative interactional linguistics. John Benjamins. https://doi.org/10.1075/z.228.
Schneider, Gerold. 2024. Text analytics for corpus linguistics and digital humanities: Simple R scripts and tools (Language, Data Science and Digital Humanities). Bloomsbury Academic.
Schneider, Gerold & Max Lauber. 2020. Statistics for Linguists. https://dlf.uzh.ch/openbooks/statisticsforlinguists/.
Open Access.Sonderegger, Morgan. 2023. Regression modeling for linguistic data. MIT Press.
Open Access version.Speelman, Dirk, Kris Heylen & Dirk Geeraerts. 2018. Mixed-Effects Regression Models in Linguistics (Quantitative Methods in the Humanities and Social Sciences). Springer International Publishing. https://doi.org/10.1007/978-3-319-69830-4.
Vasishth, Shravan, Daniel Schad, Audrey Bürki & Reinhold Kliegl. 2022. Linear Mixed Models in Linguistics and Psychology: A Comprehensive Introduction. https://vasishth.github.io/Freq_CogSci/.
Open Access.Winter, Bodo. 2020. Statistics for Linguists: An Introduction Using R. Routledge. https://doi.org/10.4324/9781315165547.
Further Open Educational Resources 🌍🌎🌏
This list focuses on OERs on statistics, data visualisation, computational literacy, reproducibility, and Open Science that are accessible to students of the language sciences. For open-access textbooks on linguistics, see Roberta D’Alessandro’s curated list: https://www.robertadalessandro.it/oa-textbooks.
Statistics
- Çetinkaya-Rundel, Mina & Johanna Hardin. 2024. Introduction to Modern Statistics. 2nd Edition. https://openintro-ims.netlify.app/.
- Clark, Michael & Seth Berry. 2025. Models Demystified: A Practical Guide from t-tests to Deep Learning. CRC Press. https://m-clark.github.io/book-of-models/.
- Gelman, Andrew & Aki Vehtari. 2024. Active statistics: Stories, games, problems, and hands-on demonstrations for applied regression and causal inference. Cambridge University Press. https://avehtari.github.io/ActiveStatistics/.
- Greenwood, Mark C. 2022. Intermediate Statistics with R. https://greenwood-stat.github.io/GreenwoodBookHTML/.
- Harrer, Mathias, Cuijpers, Pim, Furukawa, Toshi, & Ebert, David. 2021. Doing Meta-Analysis with R: A Hands-On Guide. Chapman & Hall/CRC Press. https://doing-meta.guide/.
- Jané, Matthew B., Qinyu Xiao, Siu Kit Yeung, Mattan S. Ben-Shachar, Aaron R. Caldwell, Denis Cousineau, Daniel J. Dunleavy, Mahmoud Elsherif, Blair T. Johnson, David Moreau, Paul Riesthuis, Lukas Röseler, James Steele, Felipe Fontana Vieira, Mircea Zloteanu & Gilad Feldman. 2024. Guide to Effect Sizes and Confidence Intervals. https://matthewbjane.quarto.pub/guide-to-effect-sizes-and-confidence-intervals/.
- Johnson, Alicia A., Miles Q. Ott & Mine Doğucu. 2021. Bayes Rules! An Introduction to Applied Bayesian Modeling. CRC Press. https://www.bayesrulesbook.com/.
- Lakens, Daniël. 2022. Improving Your Statistical Inferences. https://lakens.github.io/statistical_inferences/.
- LLaudet, Elena. 2025. Data Analysis for Social Science (DSS). https://ellaudet.github.io/dss_instructor_resources/.
- Navarro, Daniel. n.d. Learning statistics with R: A tutorial for psychology students and other beginners. https://learningstatisticswithr.com.
Data Visualisation
- Healy, Kieran. 2026. Data Visualization: A Practical Introduction. Second Edition. https://socviz.co/.
- Heiss, Andrew. 2023. Data Visualization with R. https://datavizf23.classes.andrewheiss.com/.
- Holtz, Yan & Conor Healy. 2018. from Data to Viz. https://www.data-to-viz.com/.
- Kabacoff, Robert. 2024. Modern Data Visualization with R. CRC Press. https://doi.org/10.1201/9781003299271. Open Access version: https://rkabacoff.github.io/datavis.
- Rennie, Nicola. 2026. The Art of Data Visualization with ggplot2: The TidyTuesday Cookbook. Soon to be published with CRC Press. Open Access version: https://nrennie.rbind.io/art-of-viz/.
- Mulvaney, Nora, Audrey Wubbenhorst & Amtoj Kaur 2022. Critical Data Literacy: Strategies to Effectively Interpret and Evaluate Data Visualizations. https://pressbooks.library.torontomu.ca/criticaldataliteracy/.
- Wickham, Hadley, Danielle Navarro, and Thomas Lin Pedersen. 2026. ggplot2: Elegant Graphics for Data Analysis (3e). Open Access version: https://ggplot2-book.org/.
- Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. https://clauswilke.com/dataviz/.
Data Science and R
- Baruffa, Oscar. 2026. Big Book of R. A compendium of over 400 open-access books and other resources about
R. https://www.bigbookofr.com/. - Dauber, Daniel. 2025. R for Non-Programmers (R4NP). https://r4np.com/.
- Estrellado, Ryan A., Emily A. Freer, Joshua M. Rosenberg & Isabella C. Velásquez. 2020. Data science in education using R. 2nd edn. London, England: Routledge. https://datascienceineducation.com/.
- Imai, Kosuke & Nora Webb Williams. 2022. Quantitative Social Science: An Introduction in tidyverse. https://press.princeton.edu/books/paperback/9780691222288/quantitative-social-science.
- Wickham, Hadley, Mina Çetinkaya-Rundel & Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd edn. O’Reilly Media. https://r4ds.hadley.nz/intro.
Reproducibility and Open Science
- Bryan, Jennifer. n.d. Happy Git and GitHub for the useR. https://happygitwithr.com/.
- Rodrigues, Bruno. 2023. Building reproducible analytical pipelines with R. https://raps-with-r.dev/.
- Röseler, Lukas, Lukas Wallrich, Helena Hartmann, Luisa Altegoer, Veronica Boyce, Sarahanne M. Field, Janik Goltermann, et al. 2025. Handbook for Reproduction and Replication Studies. Retrieved from https://forrt.org/replication_handbook. https://doi.org/10.5281/zenodo.16990114.
- Tierney, Nicholas. 2025. Quarto for scientists. https://qmd4sci.njtierney.com/.
- UCSB Carpentry. 2022. Introduction to Reproducible Publications with Quarto. https://carpentry.library.ucsb.edu/reproducible-publications-quarto/.
- Wittkuhn, Lennart, Konrad Pagenstedt & Liese Kümmerle. 2024. The Version Control Book: Track, organize and share your work: An introduction to Git for research. https://lennartwittkuhn.com/version-control-book/.
Computational literacy and AI
- Arregoitia, Luis D. Verde. 2026. Large Language Model tools for R. https://luisdva.github.io/llmsr-book/. [also available in Spanish]
- Bergstrom, Carl T. & Jevin D. West. Modern-Day Oracles or Bullshit Machines: How to thrive in a ChatGPT world. Online course. https://thebullshitmachines.com.
- Bryan, Jennifer, Jim Heste, Shannon Pileggi & E. David Aja. n.d. What they forgot to teach you about R: The stuff you need to know about R, besides data analysis. https://rstats.wtf/.
- Healy, Kieran. 2025. Modern Plain Text Computing: https://mptc.io/content/.
Text Analysis
- Buskin, Vladimir, Thomas Brunner & Philippa Adolf. n.d. Statistics and Data Analysis for Corpus Linguists: From Theory to Practice with R. https://vbuskin.github.io/Stats_with_R/.
- Hvitfeldt, Emil & Julia Silge. 2021. Supervised Machine Learning for Text Analysis in R. Boca Raton: Chapman and Hall/CRC. https://smltar.com/.
- Reinhart, Alex & David Brown. n.d. Text Analysis for Statistics and Data Science. https://browndw.github.io/textstat_docs/.
- Chen, Alvin Cheng-Hsien. 2023. Corpus Linguistics. https://alvinntnu.github.io/NTNU_ENC2036_LECTURES/.
- Silge, Julia, & David Robinson. 2017. Text Mining with R: A Tidy Approach. O’Reilly Media. https://www.tidytextmining.com.
➡️ See also the many great tutorials published at Programming Historian (PH) (in English, French, Spanish, and Portuguese) on topics as diverse as web scraping, text processing in R and Python, TEI encoding, machine learning, topic modelling, network analysis, and much more!
Python 🐍
- Huber, Florian. 2025. Introduction to Data Science with Python. v.0.23. https://florian-huber.github.io/data_science_course/book/cover.html.
Glossaries
- Carpetries Glosario. https://glosario.carpentries.org/en/.
- FORRT Glossary. https://forrt.org/glossary/: Parsons, Sam, Flávio Azevedo, Mahmoud M. Elsherif, Samuel Guay, Owen N. Shahim, Gisela H. Govaart, Emma Norris, et al. 2022. A community-sourced glossary of open scholarship terms. Nature Human Behaviour. Nature 6(3). 312–318. https://doi.org/10.1038/s41562-021-01269-4.
Selected R packages for the language sciences 📦
Eder, Maciej, Jan Rybicki & Mike Kestemont. 2015. {stylo}. Stylometry with R: A Package for Computational Text Analysis. The R Journal 8(1). 107–121. https://doi.org/10.32614/RJ-2016-007.
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller & Akitaka Matsuo. 2018. {quanteda}: An R package for the quantitative analysis of textual data.”Journal of Open Source Software, 3(30), 774. doi:10.21105/joss.00774. https://quanteda.io.
Nini, Andrea & David van Leeuwen. 2024. {idiolect}: Forensic Authorship Analysis. https://github.com/andreanini/idiolect. Contains tools for authorship analysis functions and is based on {quanteda}.
Benoit, Kenneth, Johannes Gruber & Akitaka Matsuo. 2023. {spacyr}. https://github.com/quanteda/spacyr. Tokenisers, lemmatisers, parsing dependencies and more NLP tools for a host of languages (see also https://cran.r-project.org/web/packages/spacyr/vignettes/using_spacyr.html).
Gagolewski Marek. 2022. {stringi}: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1–59, https://dx.doi.org/10.18637/jss.v103.i02. https://github.com/gagolews/stringi. Has many useful functions to work with string data and for text mining.
Luginbuhl, Félix & Hadley Wickham. 2021. {polyglot}. https://github.com/lgnbhl/polyglot. Uses R as an interactive learning environment to learn vocabulary in a foreign language.
Montes, Mariana. 2024. {glossr}. https://montesmariana.github.io/glossr/. Offers tools to include interlinear glosses in R Markdown or Quarto documents.
Schiborr, Nils Norman. 2025. {multicastR}. https://cran.r-project.org/web/packages/multicastR/index.html. Accesses the annotated database of spoken natural language from the Multi-CAST collection.
Sönning, Lukas. 2025. {tlda}. https://github.com/lsoenning/tlda. Implementation of various corpus-linguistic dispersion analysis metrics.
Taylor, Jack. 2025. {LexOPS}. https://jackedtaylor.github.io/LexOPSdocs/index.html. Generates experimentally matched stimuli.
Packages to access corpus data:
Blaette, Anreas. 2021. {GermaParl}. https://polmine.github.io/GermaParl/. Provides access to the annotated data of the CWB-indexed GermaParl corpus.
Harmon, Jon, Myfanwy Johnston, Jordan Bradford & David Robinson. 2025. {gutenbergr}. https://github.com/ropensci/gutenbergr. Downloads works from Project Gutenberg with the corresponding metadata.
Kupietz, Marc & Nils Diewald. 2025. {RKorAPClient}: ‘KorAP’ Web Service Client Package. https://github.com/KorAP/RKorAPClient. Allows for direct access to the corpora of the IDS platform ‘KorAP’.
Packages to work with ELAN data:
Wiesner, Katja & Nicolas Werner. 2024. {rELAN}. Imports ELAN data into R. https://github.com/relan-package/rELAN/?tab=readme-ov-file.
Moroz, George. 2025. {RCaucTile}. https://cran.rstudio.com/web/packages/RCaucTile/index.html. Creates maps for the East Caucasian language family.
Shim, Ryan Soh-Eun & John Nerbonne. 2022. {dialectR}. https://github.com/b05102139/dialectR. Performs quantitative analyses of dialects based on categorical measures of difference and on variants of edit distance. https://aclanthology.org/2022.vardial-1.3.pdf.