Ch. 14: Tasks & Quizzes – Data Analysis for the Language Sciences

Your turn!

Watch this video from 2019, in which Garrett Grolemund (data scientist and instructor at Posit, the company behind RStudio) explains why literate programming is key to improving medical science, data science, and ultimately all empirical research endeavours. Be aware that everything that Garrett says about R Markdown is also true of the newer format Quarto.

Q14.1 What is meant by the replication crisis?[^14_quarto-2]

An approach that integrates external criticism by colleagues and peers into the research process.
The application of statistical principles to arrive at well-founded —i.e. likely corresponding accurately to the real world— concepts, conclusions or measurement.
An aphorism describing the pressure researchers feel to publish academic manuscripts, often in high prestige academic journals, in order to have a successful academic career.
The tendency to report only significant results in the abstract, while reporting non-significant results within the main body of the manuscript (not reporting non-significant results altogether would constitute selective reporting).
The finding, and related shift in academic culture and thinking, that a large proportion of scientific studies published across different disciplines do not replicate.
A set of good research practices based on fundamental principles: honesty, reliability, respect and accountability.

🐭 Click on the mouse for a hint.

Q14.2 Which stages of the research process are potential sources of uncertainty?

🐭 Click on the mouse for a hint.

Q14.3 Which of these aspects are necessary for a linguist to fully understand the conclusions of another linguist’s quantitative study?

🐭 Click on the mouse for a hint.

Your turn!

Q14.4 In this task, you will practice using RStudio’s Visual model to format text in a Quarto document.

In a new line beginning after the final --- of the YAML header, paste the introduction text below.
Using the Quarto editing toolbar, format the text so that, in the Visual mode, it looks like the text displayed in the screenshot below.
Render the document and compare how it is formatted in the HTML version.

Introduction

The aim of this report is to reproduce the descriptive statistics reported in Dąbrowska (2019: 5-6) using the original datasets (Dąbrowska 2019: Appendix S4):

Method

Participants

Ninety native speakers (42 male and 48 female) and 67 nonnative speakers of English (21 male and 46 female) were recruited through personal contacts, church and social clubs, and advertisements in local newspapers. Participants were told that the purpose of the study was to examine individual differences in native and nonnative speakers’ knowledge of English and whether these differences are related to their linguistic experience and abilities. All participants signed a written consent form before the research commenced.

The L1 participants were all born and raised in the United Kingdom and were selected to ensure a range of ages, occupations, and educational backgrounds. The age range was from 17 to 65 years (M = 38, SD = 16). Twenty-two percent of the participants held manual jobs, 24% held clerical positions, and 28% had professional-level jobs or were studying for a degree; the remaining 26% were occupationally inactive (i.e. unemployed, retired, or homemakers). In terms of education, participants’ backgrounds ranged from no formal qualifications to Ph.D., with corresponding differences in the number of years spent in full-time education (from 10 to 21; M = 14, SD = 2). Six participants reported a working knowledge of another language; the rest described themselves as monolinguals.

In RStudio’s visual mode, what is the name of the formatting option that indents and adds a grey line to the left of a quoted paragraph as in the screenshot above?

Click here for the full solution to Q14.4

In the Visual mode (see Figure 1 (a)), click on the “Normal” drop-down menu (see Figure 1 (b)) to change the formatting of the word Introduction to the “Header 1” style. To format the long citation, choose the “Blockquote” option from the the “Format” drop-down menu (see Figure 1 (c)).

Figure 1: Text formatting using the Visual mode in RStudio

Your turn!

Switch to the Source mode to view the text that you formatted in the Visual editor for Q14.4 in Markdown format.

Q14.5 How is text highlighted in bold displayed in Markdown?

Q14.6 How is a first-level heading displayed in Markdown?

Q14.7 How are block quotes formatted in Markdown?

Q14.8 How will the word ~~mystery~~ be formatted in Markdown?

Your turn!

In your Quarto document, add a label to your first R chunk and render your document to HTML.

```{r}
#| label: setup

library(here)
library(tidyverse)
```

Q14.9 What is the output of the setup chunk in your rendered .html document?

🐭 Click on the mouse for a hint.

Q14.10 Which code chunk option can you use to remove the two messages from the rendered version of your Quarto document, whilst still ensuring that the setup chunk is displayed and executed so that the libraries can be used in future code chunks?

🐭 Click on the mouse for a hint.

Q14.11 Which code chunk option can you use to remove both the setup chunk and its outputs from the rendered version of your Quarto document, whilst still ensuring that the libraries are loaded so that their functions can be used further down in the document?

Your turn!

Q14.12 In your Quarto document, add a code chunk called L2-gender in which you compute the values necessary to complete the missing descriptive statistics in the sentence above. When rendered, your sentence should read:

90 native speakers (42 male and 48 female) and 67 nonnative speakers of English (21 male and 46 female) were recruited through personal contacts, church and social clubs, and advertisements in local newspapers.

Which value requires more than just one line of code?

🐭 Click on the mouse for a hint.

Click here for the solution to Q14.12

To save the number of male L2 participants as an R object, we can follow the same procedure as above.

L2.males <- L2.data |>  
  filter(Gender == "M") |>
  count()

For the number of female L2 participants, however, it’s not so simple because some are labelled f, while others are labelled F (see Using across() to transform multiple columns).

table(L2.data$Gender)


 f  F  M 
 6 40 21

Below are four possible methods to solve this issue (and there are many more still!):

# Method 1:
L2.Females <- L2.data |> 
  filter(Gender == "F") |> 
  count()

L2.females <- L2.data |> 
  filter(Gender == "f") |> 
  count()

L2.allfemales <- L2.Females + L2.females 

# Method 2:
L2.allfemales <- L2.data |> 
  filter(Gender == "F" | Gender == "f") |> 
  count()

# Method 3:
L2.allfemales <- L2.data |> 
  filter(Gender %in% c("F", "f")) |> 
  count()

# Method 4:
L2.allfemales <- L2.data |> 
  mutate(Gender = toupper(Gender)) |> 
  filter(Gender == "F") |> 
  count()

Some of these methods are perhaps more elegant than others, but they are all acceptable. After all, they all work! 🙃

Once they are saved to the local environment, the values can be inserted inline in the usual way:


 `{r} nrow(L1.data)` native speakers (`{r} L1.males` male and `{r} L1.females` female) and `{r} nrow(L2.data)` nonnative speakers of English (`{r} L2.males` male and `{r} L2.allfemales` female) were recruited through personal contacts, church and social clubs, and advertisements in local newspapers.

Your turn!

👩🏾‍💻 Copy the code and text sections corresponding to the description of participants’ professional occupations and education displayed Inline code (in the textbox “More complex inline computations”) into your Quarto document and render it to HTML. Compare the values in your rendered document with the original ones from the published study (see below).

“Twenty-two percent of the participants held manual jobs, 24% held clerical positions, and 28% had professional-level jobs or were studying for a degree; the remaining 26% were occupationally inactive (i.e. unemployed, retired, or homemakers). In terms of education, participants’ backgrounds ranged from no formal qualifications to Ph.D., with corresponding differences in the number of years spent in full-time education (from 10 to 21; M = 14, SD = 2). Six participants reported a working knowledge of another language; the rest described themselves as monolinguals” (Dąbrowska 2019: 6).

Q14.13 Compare the rendered version of your document with the original descriptive statistics reported in Dąbrowska (2019: 6). Could you successfully reproduce these descriptive statistics? Which values are different?

Your turn!

Using any of the methods described above, add an in-text bibliographic reference to the following article in your Quarto document:

In’nami, Yo, Atsushi Mizumoto, Luke Plonsky & Rie Koizumi. 2022. Promoting computationally reproducible research in applied linguistics: Recommended practices and considerations. Research Methods in Applied Linguistics 1(3). 100030. https://doi.org/10.1016/j.rmal.2022.100030.

Specifically, we want to cite this passage from page 8:

As implementing these steps may seem daunting, we recommend that researchers engage in reproducible research incrementally. That may be one small step for a researcher, but it will represent a giant leap for the field of applied linguistics when consolidated and accumulated in the long run.

Q14.14 If the key to this article in the .bib file is innami2022, which in-text citation can be used to cite this specific page within a Quarto document?

🐭 Click on the mouse for a hint.

Go to the Zotero style repository and download the .csl citation stylesheet to format references according to the American Psychological Association (APA) 7^th edition. Link this stylesheet to your Quarto document and render to HTML.

Q14.15 Now that your document includes references formatted in APA7, how are the authors’ names listed in your bibliography?

Your turn!

Q14.16 If you completed Q8.15 (Range), you copied-and-pasted a paragraph with gaps into LibreOffice Writer or Microsoft Word and then manually inserted descriptive statistics that you had calculated in R. This method of copying-and-pasting across different programmes is very error-prone: What if you accidentally pasted the wrong number in the wrong place? And what if there was an update to the dataset or you made some changes to the data cleaning procedure? You’d have to manually change all the numbers again! 🤯 This is time-consuming and, more worryingly, very likely to result in errors!

Now that you know about literate programming in Quarto, rewrite the following paragraph describing the GrammarR variable in L1.data and L2.data in Quarto. Use in-text code chunks to fill the gaps and then render your paragraph to .docx or .odt format to check the results.

On average, English native speakers performed only marginally better in the English grammatical comprehension test (median = ______) than English L2 learners (median = ______). However, L1 participants’ grammatical comprehension test results ranged from ______to ______, whereas L2 participants’ results ranged from ______to ______.

Click here for the solution to Q14.16

Below is a screenshot of a Quarto document with the inline code chunks and its rendered .odt version as opened in LibreOffice Writer. You can click on the images to zoom in.

The paragraph with inline code in Quarto reads: On average, English native speakers performed only marginally better in the English grammatical comprehension test (median = 76) than English L2 learners (median = 75). L1 participants' grammatical comprehension test results ranged from 58 to 80. In this same test, L2 participants' results ranged 40 to 80. — Quarto source code

Rendered document as opened in LibreOffice Writer

Check your progress 🌟