Next-step resources
In the hope that this textbook has inspired you to dive deeper into the wonderful world of quantitative linguistics, data analysis, statistics, data visualisation, and coding in R, here is a (work-in-progress) curated list of further resources to continue your learning journey! 🚀
As with the rest of this textbook (see Preface), this list is very much work in progress. Full bibliographic references will be added in due course. Do drop me a line to let me know about any great resources that I have missed!
Recommended resources specific to the language sciences (in alphabetical order)
Brezina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge University Press. https://doi.org/10.1017/9781316410899.
Desagulier, Guillaume. 2017. Corpus Linguistics and Statistics with R: Introduction to Quantitative Methods in Linguistics (Quantitative Methods in the Humanities and Social Sciences). Springer International Publishing.
Gries, Stefan Thomas. 2021. Statistics for linguistics with R: a practical introduction (De Gruyter Mouton Textbook). 3rd revised edition. de Gruyter Mouton.
Imai, Kosuke & Nora Webb Williams. 2022. Quantitative Social Science: An Introduction in tidyverse. https://press.princeton.edu/books/paperback/9780691222288/quantitative-social-science.
LADAL contributors. Tutorials of the Language Technology and Data Analysis Laboratory. https://ladal.edu.au/tutorials.html.
Open Educational Resource.Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. John Benjamins. https://doi.org/10.1075/z.195.
Francom, Jerid. 2025. An Introduction to Quantitative Text Analysis for Linguistics: Reproducible Research Using R. Routledge. https://doi.org/10.4324/9781003393764.
Open Access.Rühlemann, Christoph. 2020. Visual Linguistics with R: A practical introduction to quantitative Interactional Linguistics. John Benjamins. https://doi.org/10.1075/z.228.
Schneider, Gerold. 2024. Text analytics for corpus linguistics and digital humanities: Simple R scripts and tools (Language, Data Science and Digital Humanities). Bloomsbury Academic.
Schneider, Gerold & Max Lauber. 2020. Statistics for Linguists. https://dlf.uzh.ch/openbooks/statisticsforlinguists/.
Open Educational Resource.Sonderegger, Morgan. 2023. Regression modeling for linguistic data. MIT Press.
Open Access version.Speelman, Dirk, Kris Heylen & Dirk Geeraerts (eds.). 2018. Mixed-Effects Regression Models in Linguistics (Quantitative Methods in the Humanities and Social Sciences). Springer International Publishing. https://doi.org/10.1007/978-3-319-69830-4.
Winter, Bodo. 2020. Statistics for Linguists: An Introduction Using R. Routledge. https://doi.org/10.4324/9781315165547.
Further Open Educational Resources (work in progress)
This list focuses on OERs on statistics, data visualisation, reproducibility, and Open Science. For open-access textbooks on linguistics, see Roberta D’Alessandro’s curated list: https://www.robertadalessandro.it/oa-textbooks.
Statistics
Çetinkaya-Rundel, Mina & Johanna Hardin. 2024. Introduction to Modern Statistics. 2nd Edition. https://openintro-ims.netlify.app/.
Clark, Michael & Seth Berry. 2025. Models Demystified: A Practical Guide from t-tests to Deep Learning. CRC Press. https://m-clark.github.io/book-of-models/.
Gelman, Andrew & Aki Vehtari. 2024. Active statistics: Stories, games, problems, and hands-on demonstrations for applied regression and causal inference. Cambridge University Press. https://avehtari.github.io/ActiveStatistics/.
Greenwood, Mark C. 2022. Intermediate Statistics with R. Version 3.1. https://greenwood-stat.github.io/GreenwoodBookHTML/
Jané, Matthew B., Qinyu Xiao, Siu Kit Yeung, Mattan S. Ben-Shachar, Aaron R. Caldwell, Denis Cousineau, Daniel J. Dunleavy, Mahmoud Elsherif, Blair T. Johnson, David Moreau, Paul Riesthuis, Lukas Röseler, James Steele, Felipe Fontana Vieira, Mircea Zloteanu & Gilad Feldman. 2024. Guide to Effect Sizes and Confidence Intervals. https://matthewbjane.quarto.pub/guide-to-effect-sizes-and-confidence-intervals/.
Johnson, Alicia A., Miles Q. Ott & Mine Doğucu. 2021. Bayes Rules! An Introduction to Applied Bayesian Modeling. CRC Press. https://www.bayesrulesbook.com/.
Lakens, Daniël. 2022. Improving Your Statistical Inferences. https://lakens.github.io/statistical_inferences/.
LLaudet, Elena. 2025. Data Analysis for Social Science (DSS). https://ellaudet.github.io/dss_instructor_resources/.
Navarro, Daniel. n.d. Learning statistics with R: A tutorial for psychology students and other beginners Version 0.6. https://learningstatisticswithr.com.
Data Visualization
- Heiss, Andrew. 2023. Data Visualization with R. https://datavizf23.classes.andrewheiss.com/.
- Holtz, Yan & Conor Healy. 2018. from Data to Viz. https://www.data-to-viz.com/.
- Kabacoff, Robert. 2024. Modern Data Visualization with R. CRC Press. https://doi.org/10.1201/9781003299271. Open access version: https://rkabacoff.github.io/datavis.
- Mulvaney, Nora, Audrey Wubbenhorst & Amtoj Kaur 2022. Critical Data Literacy: Strategies to Effectively Interpret and Evaluate Data Visualizations. https://pressbooks.library.torontomu.ca/criticaldataliteracy/.
- Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. https://clauswilke.com/dataviz/.
Data Science
- Dauber, Daniel. 2025. R for Non-Programmers (R4NP). https://r4np.com/.
- Estrellado, Ryan A., Emily A. Freer, Joshua M. Rosenberg & Isabella C. Velásquez. 2020. Data science in education using R. 2nd edn. London, England: Routledge. https://datascienceineducation.com/.
- Wickham, Hadley, Mina Çetinkaya-Rundel & Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd edn. O’Reilly Media. https://r4ds.hadley.nz/intro.
Reproducibility and Open Science
- Bryan, Jennifer. n.a. Happy Git and GitHub for the useR. https://happygitwithr.com/.
- Rodrigues, Bruno. 2023. Building reproducible analytical pipelines with R. https://raps-with-r.dev/.
- Röseler, Lukas. 2025. Open Science: Wie sich die Wissenschaft öffnet: Vertrauenskrise, Replikationsrevolution, und Open Science einfach und umfangreich erklärt: https://lukasroeseler.github.io/opensciencebuch/.
- Tierney, Nicholas. 2025. Quarto for scientists. https://qmd4sci.njtierney.com/.
- UCSB Carpentry. 2022. Introduction to Reproducible Publications with Quarto. https://carpentry.library.ucsb.edu/reproducible-publications-quarto/.
- Wittkuhn, Lennart, Konrad Pagenstedt & Liese Kümmerle. 2024. The Version Control Book: Track, organize and share your work: An introduction to Git for research. Last update: October 29, 2025. https://lennartwittkuhn.com/version-control-book/.
Computational literacy
- Bryan, Jennifer, Jim Heste, Shannon Pileggi & E. David Aja. n.a. What they forgot to teach you about R: The stuff you need to know about R, besides data analysis. https://rstats.wtf/.
- Healy, Kieran. 2025. Modern Plain Text Computing: https://mptc.io/content/.
Text Analysis
- Buskin, Vladimir, Thomas Brunner & Philippa Adolf. n.a. Statistics and Data Analysis for Corpus Linguists: From Theory to Practice with R. https://vbuskin.github.io/Stats_with_R/.
- Reinhart, Alex & David Brown. n.a. Text Analysis for Statistics and Data Science. https://browndw.github.io/textstat_docs/.
- Chen, Alvin Cheng-Hsien. 2023. Corpus Linguistics. https://alvinntnu.github.io/NTNU_ENC2036_LECTURES/.
- Silge, Julia, & David Robinson. 2017. Text Mining with R: A Tidy Approach. O’Reilly Media. https://www.tidytextmining.com.
Python
- Huber, Florian. 2025. Introduction to Data Science with Python. v.0.23. https://florian-huber.github.io/data_science_course/book/cover.html.
- Quené, Hugo & Huub van den Bergh. 2024. Quantitative Methods and Statistics. Retrieved 27 Aug 2024 from https://hugoquene.github.io/QMS-EN/.
Glossaries
- Carpetries Glosario. https://glosario.carpentries.org/en/.
- FORRT Glossary. https://forrt.org/glossary/: Parsons, Sam, Flávio Azevedo, Mahmoud M. Elsherif, Samuel Guay, Owen N. Shahim, Gisela H. Govaart, Emma Norris, et al. 2022. A community-sourced glossary of open scholarship terms. Nature Human Behaviour. Nature 6(3). 312–318. https://doi.org/10.1038/s41562-021-01269-4.
Corpora and other language resources
- CLARIN Resource Families: https://www.clarin.eu/resource-families.
- Language Archive Cologne: https://lac.uni-koeln.de/.
Selected R packages for the language sciences
Schmitz, Dominic & Janina Esser. 2021. SfL: Statistics for Linguistics. R package version 0.4. https://github.com/dosc91/SfL.
Nini, Andrea & David van Leeuwen. 2024. idiolect: Forensic Authorship Analysis. https://github.com/andreanini/idiolect. The package contains tools for authorship analysis functions and is based on {quanteda}.
Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller & Akitaka Matsuo. 2018. quanteda: An R package for the quantitative analysis of textual data.” Journal of Open Source Software, 3(30), 774. doi:10.21105/joss.00774. https://quanteda.io. The package is useful “for managing and analyzing text” (https://github.com/quanteda/quanteda)
Benoit, Kenneth, Johannes Gruber & Akitaka Matsuo. 2023. spacyr. https://github.com/quanteda/spacyr. The package is useful for tokenization, lemmatizing tokens, parsing dependencies and identifying token sequences (https://cran.r-project.org/web/packages/spacyr/vignettes/using_spacyr.html).
Gagolewski Marek. 2022. stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1–59, https://dx.doi.org/10.18637/jss.v103.i02. https://github.com/gagolews/stringi. The package is useful for work with string data and text mining (https://www.jstatsoft.org/article/view/v103i02).
Luginbuhl, Félix & Hadley Wickham. 2021. polyglot. https://github.com/lgnbhl/polyglot. The package uses R as “an interactive learning environment” (https://github.com/lgnbhl/polyglot?tab=readme-ov-file) to learn vocabulary of a foreign language.
Montes, Mariana. 2024. glossr: Use Interlinear Glosses in R Markdown. R package version 0.8.0. https://montesmariana.github.io/glossr/. The package offers “tools to include interlinear glosses in your R Markdown or Quarto file” (https://montesmariana.github.io/glossr/).
Schiborr, Nils Norman. 2025. multicastR: A Companion to the Multi-CAST Collection. https://cran.r-project.org/web/packages/multicastR/index.html. The package is useful for accessing the annotated database of spoken natural language from the Multi-CAST collection.
Sönning, Lukas. 2025. tlda. https://github.com/lsoenning/tlda. The package includes functions for a corpus-linguistic dispersion analysis.
Taylor, Jack. 2025. LexOPS. https://jackedtaylor.github.io/LexOPSdocs/index.html. The package is useful for generating experimentally matched stimuli.
Packages with corpus data:
Blaette, Anreas. 2021. GermaParl. https://polmine.github.io/GermaParl/. The package provides access to the annotaed data of the CWB-indexed GermaParl corpus.
Harmon, Jon, Myfanwy Johnston, Jordan Bradford & David Robinson. 2025. gutenbergr. https://github.com/ropensci/gutenbergr. The package is useful to downloads works from Project Gutenberg with the metadata.
Kupietz, Marc & Nils Diewald. 2025. RKorAPClient: ‘KorAP’ Web Service Client Package. https://github.com/KorAP/RKorAPClient. The package is useful for accessing corpora of the corpus analysis platform ‘KorAP’.
Packages to work with ELAN data:
Moroz, George. 2025. RCaucTile. https://cran.rstudio.com/web/packages/RCaucTile/index.html. The package is useful to create maps for the East Caucasian language family.
Shim, Ryan Soh-Eun & John Nerbonne. 2022. dialectR. https://github.com/b05102139/dialectR. The package is useful “for performing quantitative analyses of dialects based on categorical measures of difference and on variants of edit distance.” https://aclanthology.org/2022.vardial-1.3.pdf.