Ch. 7: Tasks & Quizzes – Data Analysis for the Language Sciences

Your turn!

Q7.1 The View() function is more user-friendly than attempting to examine the full table in the Console. Try to display the full L2.dataset in the Console by using the command L2.data which is shorthand for print(L2.data). What happens?

🐭 Click on the mouse for a hint.

Your turn!

Q7.2 Six is the default number of rows printed by the head() function. Have a look at the function’s help file using the command ?head to find out how to change this default setting. How would you get R to print the first 10 lines of L2.data?

🐭 Click on the mouse for a hint.

Your turn!

Q7.3 Which type of variable is stored in the Occupation column in L1.data?

Q7.4 Which type of variable is stored in the Gender column in L1.data?

Q7.5 Which type of variable is stored in the column VocabR, which stores participants’ scores in an English grammar comprehension test, in L1.data?

Your turn!

Compare the outputs of the str() and head() functions in the Console with that of the View() function to understand the different ways in which the same dataset can be examined in RStudio.

Q7.6 Use the str() function to examine the internal structure of the L2 dataset. How many columns are there in the L2 dataset?

Q7.7 Which of these columns can be found in the L2 dataset, but not the L1 one?

Q7.8 Which type of R object is the variable Arrival stored as?

Q7.9 How old was the third participant listed in the L2 dataset when they first moved to an English-speaking country?

🐭 Click on the mouse for a hint.

Q7.10 In both datasets, the column Participant contains anonymised participant IDs. Why is the variable Participant stored as string character vector in L1.data, but as an integer vector in L2.data?

🐭 Click on the mouse for a hint.

Your turn!

The following two quiz questions focus on the NativeLg variables from the L2 dataset (L2.data).

Q7.11 Use the index operators to find out the native language of the 26^th L2 participant.

🐭 Click on the mouse for a hint.

Q7.12 Which command(s) can you use to display only the gender, occupation, native language, and age of the last participant listed in the L2 dataset?

🐭 Click on the mouse for a second hint.

Your turn!

Q7.13 How does the average age of the L2 participants in Dąbrowska (2019) compare to that of the L1 participants?

Your turn!

For this task, you first need to check that you have saved the following two variables from the L1 dataset to your R environment.

L1.Age <- L1.data$Age
L1.Occupation <- L1.data$Occupation

Q7.14 Below is a list of useful base R functions. Try them out with the variable L1.Age. What does each function do? Make a note by writing a comment next to each command (see Writing comments in scripts). The first one has been done for you.

mean(L1.Age) # The mean() function returns the mean average of a set of number.
min()
max()
sort()
length()
mode()
class()
table()
summary()

Q7.15 Age is a numeric variable. What happens if you try these same functions with a character string variable? Find out by trying them out with the variable L1.Occupation which contains words, i.e. character strings, rather than numbers.

Click here for the solutions to Q7.14—Q7.15.

As you will have seen, often the clue is in the name of the function – but not always! 😉

Hover your mouse over the numbers on the right for the solutions to appear.

mean(L1.Age)
mean(L1.Occupation)

min(L1.Age)
min(L1.Occupation)

max(L1.Age)
max(L1.Occupation)

sort(L1.Age)
sort(L1.Occupation)

length(L1.Age)
length(L1.Occupation)

mode(L1.Age)
mode(L1.Occupation)

class(L1.Age)
class(L1.Occupation)

table(L1.Age)
table(L1.Occupation)

summary(L1.Age)
summary(L1.Occupation)

1: The mean() function returns the mean average of a set of number.
2: It does not make sense to calculate a mean average value of a set of words, therefore R returns an NA (not applicable) and a warning in red explaining that the mean() function expects a numeric or logical argument.
3: For a numeric variable, min() returns the lowest numeric value.
4: For a string variable, min() returns the first value sorted alphabetically.
5: For a numeric variable, min() returns the highest numeric value.
6: For a string variable, max() returns the last value sorted alphabetically.
7: For a numeric variable, sort() returns all of the values of the variable ordered from the smallest to the largest.
8: For a string variable, sort() returns of all of the values of the variable in alphabetical order.
9: The function length() returns the number of values in the variable.
10: The function length() returns the number of values in the variable.
11: The function mode() returns the R data type that the variable is stored as.
12: The function mode() returns the R data type that the variable is stored as.
13: The function mode() returns the R object class that the variable is stored as.
14: The function mode() returns the R object class that the variable is stored as.
15: For a numeric variable, the function table() outputs a table that tallies the number of occurrences of each unique value in a set of values and sorts them in ascending order.
16: For a string variable, the function table() outputs a table that tallies the number of occurrences of each unique value in a set of values and sorts them alphabetically.
17: For a numeric variable, the function summary() outputs six values that, together, summarise the set of values contained in this variable: the minimum and maximum values, the first and third quartiles, and the mean and median (see Chapter 8).
18: For a string variable, the summary() function only outputs the length of the string vector, its object class and data mode.

Your turn!

Look at the following two lines of code and their (abbreviated) outputs.

L1.data$Vocab

[1] 73.33333 95.55556 95.55556 84.44444 88.88889 73.33333

round(L1.data$Vocab)

[1] 73 96 96 84 89 73

Q7.16 Based on your observations, what does the round() function do?

Q7.17 Check out the ‘Usage’ section of the help file on the round() function to find out how to round the Vocab values in the L1 dataset to two decimal places. How can this be achieved?

🐭 Click on the mouse for a hint.

Your turn!

Q7.18 Using the R pipe operator, calculate the average mean age of the L2 participants and round off this value to two decimal places. What is the result?

Q7.19 Unsurprisingly, in Dąbrowska (2019)‘s study, English L1 participants, on average, scored higher in the English vocabulary test than L2 participants. Calculate the difference between L1 and L2 participants’ mean Vocab test results and round off this means difference to two decimal places.

🐭 Click on the mouse for a hint.

Click here for a detailed answer to Q7.19

They are lots of ways to tackle this in R. Here is a first approach that involves the pipe operator:

(mean(L1.data$Vocab) - mean(L2.data$Vocab)) |> 
  round(digits = 2)

[1] 16.33

Note that this approach requires a set of brackets around the first subtraction operation, otherwise only the second mean value is rounded off to two decimal places. Compare the following lines of code:

mean(L1.data$Vocab) - mean(L2.data$Vocab)

[1] 16.33315

(mean(L1.data$Vocab) - mean(L2.data$Vocab)) |> 
  round(digits = 2)

[1] 16.33

mean(L1.data$Vocab) - round(mean(L2.data$Vocab), digits = 2)

[1] 16.3358

An alternative approach would be to store the difference in means as an R object and, in a second line of code, pass this object to the round() function.

mean.diff.vocab <- mean(L1.data$Vocab) - mean(L2.data$Vocab)

round(mean.diff.vocab, digits = 2)

[1] 16.33

Or, you could combine both approaches like this:

mean.diff.vocab <- mean(L1.data$Vocab) - mean(L2.data$Vocab)
mean.diff.vocab |> 
  round(digits = 2)

[1] 16.33

There is often more than one way to solve problems in R. Choose whichever way you are most comfortable with. As you long as you understand what your code does (see 15 What’s next? AI-assisted reseaRch?), it doesn’t matter if it’s particularly elegant or efficient.

Check your progress 🌟