15 What’s next? AI-assisted resea`R`ch?

If you are reading the last chapter of what is by no means a short textbook, you are likely someone who is eager to acquire new skills and knowledge. As such, you are well aware that learning is a process that requires intrinsic motivation, time, and a great deal of effort and patience. Many tech companies, by contrast, are keen for us to believe that this tiring, time-consuming experience of human learning is no longer necessary: so-called ‘artificial intelligence’ (AI) systems are designed to spare us the burden of thinking for ourselves. If the answers to our (research) questions are just a few effortless prompts away, it is legitimate to ask: Is it still worth devoting so much time and effort to acquire and further develop knowledge and skills in data analysis, programming, and statistics?

This concluding chapter explains why we can answer this question with a decisive “yes”. To this end, we consider the role of AI in research and learning before recapping what we have learnt so far and where to go from there.

15.1 On the technology behind AI 🤖

Contrary to popular belief, “artificial intelligence” is not a technology. It is, and always has been, a marketing term used to sell (or secure research funding for) a wide, diverse, and shifting array of ideas, research projects, systems, and technologies (see Figure 15.1 from Guest et al. (2025); also Rooij et al. (2024)). Despite marketing claims, these systems and technologies are not “intelligent”; they cannot think, understand, or reason (see e.g. Quattrociocchi, Capraro & Perc 2025). This is not to say that they do not have valuable applications, but rather that we need to critically evaluate promotional claims and anthropomorphising language (e.g. AI says / thinks / knows / understands).

A set theoretic view of terms used to discuss AI technology along with real world models. The four circles are labeled LLM, ANN, chatbot, and generative model. ELIZA, jabberwacky, and ALICE are purely chatbots. QDA, Bernoulli naive Bayes and LDA are purely generative models. AlexNet is purely an ANN. There are no pure LLMs on the chart. GAN and BM are hybrid generative models and ANNs, BERT is a hybrid LLM and ANN. Siri belongs to all categories except generative model, GPT to all categories except chatbot, and ChatGPT to all categories, sitting in the center of the diagram. — Figure 15.1: A cartoon set theoretic view of terms commonly used when discussing the superset AI from Guest et al. (2025) (CC BY 4.0)

Popular, commercial chat-based AI products such as ChatGPT, Claude, CoPilot, and Gemini are powered by Large Language Models (LLMs). LLMs are statistical models fitted to huge amounts of training data to generate a probable response given a prompt based on statistical patterns found in the training data, reinforcement learning from human feedback, as well as additional opaque (usually profit-driven) criteria (see e.g. Bender & Hanna 2025). At first sight, the underlying principle of next-token prediction is comparable to the autocomplete functions from our phones and Google search bar (see Figure 15.2).

A looping GIF of a screen capture showing someone typing "LLMs are not" into Google and various automated completions pop up like "LLMs are not ai". — Figure 15.2: Autocomplete function of web search query (google.com on 8 February 2026)

It goes without saying that modern LLMs are much more powerful predictive text generators than the models that once offered to help us draft our text messages. This is due to four key factors:

The development of a novel algorithmic architecture known as a transformer that is based on attention mechanisms (Vaswani et al. 2017).
The availability and largely unethical, if not outright illegal (see e.g. Samuelson 2023; Lucchi 2024) scraping of huge amounts of training data from the internet, including (academic) books and articles, but also vast amounts of blog posts, social media data, forum discussions, Wikipedia articles, and YouTube videos.
The availability of (relatively) cheap, large-scale computational power (which, however, still comes at a high environmental cost, see e.g. Luccioni et al. 2025; Schön, Hoffmann & Becker 2025).
Reinforcement learning from human feedback, a process whereby crowdsourced human workers mostly from low-income countries (see e.g. Perrigo 2023) provide extensive feedback on LLM outputs to fine-tune models for what humans want to obtain when they query a model (see Bender & Hanna 2025: Chapter 3).

Despite its name, OpenAI — the main company behind ChatGPT — does not develop open-source LLMs, nor is the company a not-for-profit initiative. So why have OpenAI and other large tech firms been offering their AI products for free or at prices well below actual running (let alone development) costs? Two reasons are worth considering. First, because one major bottleneck to improving current LLMs is access to new data. Human-generated data is highly valuable and chatbot users are providing lots of it. AI companies are harvesting this data to train the next generation of LLMs. Second, because free or cheap access to AI products encourages us to become reliant on them for all kinds of work-related tasks and personal activities. As they become ubiquitous to our everyday lives, we are naturally inclined to rely on and trust their outputs. It is only a matter of time until subscriptions prices are hiked up and/or promoted contents are (more or less transparently) integrated in model outputs (see e.g. Mühlhoff 2025: 91-97).

Your turn!

Q15.1 Which of the following analogies have (rightly or wrongly) been used by different scholars to refer to Large Language Models (LLMs)? You may have heard of one or a few of these, but you can guess them all?

The power of language: Metaphors of AI and LLMs ✨

The term ‘stochastic parrot’ 🦜 is probably the best known LLM metaphor. It was introduced by computational linguist Emily M. Bender and colleagues in 2021 to characterise LLMs as systems that mimic text without true understanding, highlighting their limitations in processing meaning. This metaphor emphasises that LLMs generate outputs based on statistical patterns in their training data, similar to how parrots mimic sounds without comprehension.

In 2022, the cognitive scientist Iris Van Rooij published a blog post in which she succinctly explains why she believes that LLMs cannot legitimately be used for academic writing because they essentially “automate plagiarism”.

The ‘spicy autocomplete’ metaphor 🌶️ is difficult to trace back to one or more specific author(s). It suggests that LLMs are just fancy versions of a smart phone’s predictive text, which predict the next word based on what came before, except that LLMs add some randomness, i.e. spice, to the output. This framing also implies that LLMs are pattern-matching algorithms without real understanding or problem-solving ability (see Groß 2024).

In 2024, Hicks, Humphries & Slater published a paper in the journal ‘Ethics and Information Technology’ entitled ‘ChatGPT is bullshit’, in which they argue that the output of LLMs is best understood as ‘bullshit’ in the philosophical sense described by Frankfurt (2005) because LLMs are indifferent to the truth of their outputs.

‘Synthetic text extruding machines’ is a term that Emily M. Bender and Alex Hanna like to use, e.g. in their 2025 book entitled ‘The AI Con: How to Fight Big Tech’s Hype and Create the Future We Want’. They describe the process of LLM-generated texts by explaining that, “[l]ike an industrial plastic process, language corpora are forced through complicated machinery to produce a product that looks like communicative language, but without any intent or thinking mind behind it.”

As an alternative to well-established metaphors such as the ones listed above, classical philologist Gyburg Uhlmann proposed ‘kitsch’ as a new metaphor to describe the output of LLMs. She argues that ‘kitsch’ “is particularly suitable for analytically illuminating a previously neglected feature of LLM-based images and texts: their tendency to produce homogeneous and average content, which is […] leading to the equalisation of language, style and argument” (Uhlmann 2025).

15.4 On the value of the commons

It is worth pointing out that the programmers in the control, no AI group in the aforementioned Anthropic study (Tamkin & Shen 2026) did not write code with no help whatsoever. Instead, they had access to regular web searching and code documentation. It is a common misconception that learning to code involves memorising lots of functions and commands. In practice, programming always involves looking things up. Up until very recently, researchers would typically search the web to find answers to their coding problems instead of prompting an AI product. These searches led us to interesting forum discussions (e.g., this Q&A on justifying text in Quarto), blog posts by fellow researchers and developers (e.g. Andrew Heiss’ blog), and helpful documentation files (e.g. in the form of vignettes). If our web search did not help us solve our problem, we prepared a reprex, short for minimal reproducible example (see Wickham, Çetinkaya-Rundel & Grolemund 2023: Chapter 8 on getting help), posted it on a dedicated forum and community members would typically provide helpful answers within days, hours, or even minutes.

All of these contents were human-generated. Not all were 100% reliable, but high-quality answers to problems on StackOverflow, for example, were upvoted by readers and errors in the documentation of open-source packages were quickly identified and corrected by the community of users. The same principle was true of statistical questions. Today, this strategy is still viable but, unfortunately, there are a lot of AI-generated webpages (‘AI slop’) that one first needs to ignore.

Your turn!

Q15.9 What is Stack Exchange?

🦜 Click on the parrot for a hint.

Q15.10 Open this archived version of a Q&A about confidence intervals hosted on Cross Validated, Stack Exchange’s statistics forum. In which year did Eliott originally ask their question?

🦜 Click on the parrot for a hint.

Q15.11 As of 9 February 2026 when the page was archived, how many Cross Validated members had upvoted the top answer?

🦜 Click on the parrot for a hint.

In Section 11.3, we conducted the following t-test to find out whether the observed difference between the L1 and L2 speakers’ non-verbal IQ ‘Blocks’ test scores was significant or not:

t.test(formula = Blocks ~ Group, 
       data = Dabrowska.data)


    Welch Two Sample t-test

data:  Blocks by Group
t = -4.4084, df = 152.46, p-value = 1.956e-05
alternative hypothesis: true difference in means between group L1 and group L2 is not equal to 0
95 percent confidence interval:
 -5.239791 -1.996693
sample estimates:
mean in group L1 mean in group L2 
        13.84444         17.46269

The output of the t.test() command above was fed into an LLM (DeepSeek R1 Distill Llama 70B) with a prompt requesting a short paragraph interpreting these results. The model’s response is printed below:

The Welch two‑sample t‑test shows a significant difference between the groups (t = ‑4.41, df ≈ 152.5, p = 1.96 × 10⁻⁵). Group L1 has a mean of 13.84 blocks, while group L2’s mean is 17.46 blocks, a mean difference of ‑3.62 blocks. The 95 % confidence interval for this difference (‑5.24 to ‑1.99) means we can be 95 % confident that the true average in L2 exceeds that in L1 by roughly 2 to 5 blocks. Because the interval does not include zero, the null hypothesis of equal means is rejected.

Q15.12 Aside from the fact that “blocks” is not the correct measurement unit, which major statistical error does the LLM’s interpretation of the t-test output include?

In addition to the flood of AI slop making it difficult to find reliable information, human content creators are finding that their intellectual property is being scraped without consent to be used as training data for commercial AI products. In the era of Open Scholarship, researchers, software developers, and educators have been sharing their work with the world for the benefit of scientific progress, typically with only authorship attribution as a reward. However, this may change as AI companies appropriate their work and LLMs automate plagiarism (Rooij 2022).

This may sound like an entirely dystopian situation but, up until fairly recently, researchers had to pay to use programming languages for statistical analyses and scientific computing. FORTRAN compilers, MatLab, S, and SPPS were (and still are) proprietary software which were inaccessible to many researchers and students. It is no exaggeration to say that open-source, community-led programming languages such as R, Python, and Julia revolutionised data analysis, making state-of-the-art methods accessible to far more people. However, as high-quality open-access resources become rarer due the contamination of AI slop and illegal scraping, we may be returning to an era of restricted access to scientific computing. Hence, it is worth remembering that it is communities of humans who have been developing programming languages such as R and Python, their many extensions such as the {tidyverse} packages (see also selected list of linguistics-specific R packages), open-source software such as R Studio, and high-quality documentation and Open Educational Resources.

These communities depend on collaboration, interactions, and mutual support. Yet, we have entered an age where human interactions are marketed as unproductive, time-consuming, and burdensome. We are told that they can easily be replaced by more efficient and “objective” chatbots.

It can’t be overstated the extent to which part of why these [AI] tools exist is to isolate us, to insert these [AI] companies in between us and all the relationships that we have. (Chris Gilliard speaking on Tech Won’t Save Us on 16 October 2025)

Aside from the fact that LLMs are known to be prone to all kinds of very serious biases and that their efficiency is far from proven (see e.g. Loker 2025 on how AI code tends to create more problems than human-generated code), we should not lose sight of the value of subjective, human interactions. This is not to say that LLMs are never useful, but rather that I strongly encourage you to (also) devote time to learning from reliable, human-generated resources (see e.g. Next-step resources), join a course with other human beings to continue your learning journey, and/or find a learning buddy to discuss and solve problems together. This textbook was entirely human-generated and benefited greatly from countless rounds of revisions thanks to interactions with and feedback from (human!) students and colleagues (see Acknowledgements). Of course, it would have been quicker to write the textbook without asking for feedback or to get an LLM to generate first drafts of sections, code, and/or quiz questions. But in linguistics research, teaching, and learning, quality matters more than velocity.

Check your progress 🌟

Congratulations, you have reached the end of this textbook! In this concluding chapter, you have answered 0 out of 12 questions correctly. Check out the Next-step resources to continue your learning journey!

More food for thought 🍏 🍎

Bergstrom, Carl T. & Jevin D. West. Modern-Day Oracles or Bullshit Machines: How to thrive in a ChatGPT world. Online course. https://thebullshitmachines.com. [Open Educational Resource].
Dingemanse, Mark. 2024. Generative AI and Research Integrity. OSF. https://doi.org/10.31219/osf.io/2c48n. [Open Access].
Guest, Olivia, Marcela Suarez, Barbara Müller, Edwin van Meerkerk, Arnoud Oude Groote Beverborg, Ronald de Haan, Andrea Reyes Elizondo, et al. 2025. Against the Uncritical Adoption of “AI” Technologies in Academia. Zenodo. https://zenodo.org/records/17065099. [Open Access].

➡️ See also Olivia Guest’s curated list of readings/viewings on critical AI literacy: https://olivia.science/ai/.
Monett, Dagmar & Gilbert Paquet. 2025. Against the Commodification of Education — if harms then not AI. Journal of Open, Distance, and Digital Education 2(1). https://doi.org/10.25619/wazgw457. [Open Access]

➡️ See also Dagmar Monett’s “non-exhaustive collection of worth-reading books on topics strongly related to Critical AI”: https://monettdiaz.com/books-critical-ai.html.
Mühlhoff, Rainer. 2025. The ethics of AI: Power, critique, responsibility. Bristol: Bristol University Press. https://doi.org/10.51952/9781529249262. [Open Access].

➡️ See also Rainer Mühlhoff’s Introduction to the Ethics of AI 2025 lecture videos. https://rainermuehlhoff.de/en/EoAI2025/. [Open Educational Resource].

Using LLMs in R

For us linguists, there is no doubt that LLMs are fascinating objects of study and that they can be useful for some research-related tasks, provided that their outputs be thoroughly evaluated. Once you have developed a sound understanding of how LLMs and AI products function and have critically reflected on their potential, limitations, and societal impact (see recommendations above), you may want to explore non-commercial, open-source LLMs (see Liesenfeld, Lopez & Dingemanse 2023 and the European Open Source AI Index) in R.

A great starting point to this end is the LADAL (Language Technology and Data Analysis Laboratory) tutorial on how to run local, LLMs in R with Ollama (Schweinberger 2026). You may also want to check out Luis D. Verde Arregoitia’s curated list of R packages and other resources: Large Language Model tools for R (also available in Spanish). The Read first and Further reading sections should not be skipped!

Figure 15.5: Selection of hex logos of LLM-related `R` packages from Arregoitia (2026)

15.5 What’s next? 🧭

This textbook has taken you on a journey: first introducing Open Scholarship (Chapter 1), which forms the backbone of this textbook’s approach to doing science, then consolidating knowledge about file formats, file naming, and project organisation (Chapter 2 and Chapter 3), which are often major hurdles for successful data analysis pipelines. Having installed and set up R and RStudio (Chapter 4), you took your first steps in learning to code in R (Chapter 5). Next, you learnt to import real research data into an R project (Chapter 6). From Chapter 7 onwards, you learnt how to analyse data in R using descriptive and inferential statistics. This entailed wrangling data to prepare them for statistical analyses (Chapter 9) and data visualisation (Chapter 10), as well as developing an understanding of key statistical concepts such as measures of central tendency and variability (Chapter 8), distributions, effect sizes, confidence intervals, and p-values (Chapter 11). Chapter 12 introduced statistical modelling with simple linear regression and Chapter 13 expanded this concept to multiple predictor variables and interactions between predictors. Having mastered this foundational knowledge, you are now ready to tackle more advanced statistical methods such as mixed-effects and non-linear regression models, logistic and other classification models, machine learning algorithms, and much more (see Next-step resources). In Chapter 14, you learnt how to apply literate programming skills to conduct and publish reproducible research in Quarto. Finally, in this concluding chapter, you reflected on the values of critical thinking, human learning, and of nurturing communities in the day and age of AI.

Learning statistical theory and practice are inseparable from scientific reasoning. (Vasishth & Gelman 2021: 1312)

I personally believe that it will always be worth investing in learning complex matters, developing critical thinking skills, as well as in building meaningful human relationships. No matter how powerful and efficient future AI products may be, the skills that will be valued in the future will not be “advanced” prompting techniques, but rather social skills, critical (statistical) literacy, and computational thinking. Crucially, these are also the skills that are essential to make the most of AI tools.

The resources listed in the Appendix are an excellent starting point for continuing your learning journey. In addition, the online version of this textbook features an expanding collection of case study chapters co-authored by students from my ‘Introduction to Data Analysis in R’ class. Each chapter attempts to computationally reproduce the findings of a published linguistics study using the authors’ original data. The student authors document all the steps necessary to reproduce the results and discuss the success (or not!) of their reproductions. Retracing the steps outlined in a case study that interests you or that covers a method you’d like to learn more about, is a great way to consolidate and expand the skills and knowledge you have acquired so far.

Happy learning and good luck with all your data analysis pRojects!️
Artwork by @allison_horst CC BY 4.0

I highly recommend Andrew Heiss’ blog and his fantastic teaching resources. Fun fact for language students and linguists: Andrew majored in Arabic and Italian and didn’t learn about statistics or R until he started his second master’s!↩︎

15 What’s next? AI-assisted resea`R`ch?

15.1 On the technology behind AI 🤖

15.2 On the value of critical thinking

15.3 On the value of human learning

15.4 On the value of the commons

Check your progress 🌟

15.5 What’s next? 🧭