4 Installing R
and RStudio
Chapter overview
This chapter is designed to help you get started using R
and RStudio, assuming no prior use of either. We will be covering the following topics:
- Why it’s worth learning
R
- Downloading
R
and RStudio - Setting up RStudio
- Using the Console in RStudio
- Installing and loading
R
packages - Accessing help files
- Citing packages
If you already have some experience of using R
and RStudio, please ensure that both are up-to-date. Whilst parts of this chapter will likely be revision, others may be the opportunity to learn some new tips about setting up and using R
in RStudio, installing and citing packages. Once you’ve skimmed through this chapter, feel free to swiftly move on to Chapter 5.
4.1 Why learn R
?
In short, because R
can do it all! 🙃 This statement is only a slight exaggeration: R
is indeed a highly versatile programming language and environment that allows us to do a multitude of tasks relevant to the language sciences. These include data handling and processing, statistical analysis, creating effective and appealing data visualisations, web scraping, text analysis, generating reports in various formats, designing web pages, and interactive apps, and much, much more! 💪
Whilst some will claim that R
has a steep learning curve, this textbook aims to prove that the opposite is true! Whilst it’s fair to say that, as with all new things, it will take you a while to get the hang of it, once you’ve got started, you will see that your possibilities are (pretty much) endless and that learning how to do new things in R
makes for fun and very rewarding challenges. What’s more, this textbook introduces the {tidyverse} approach to programming in R
, which is particularly accessible to beginners. We will also use RStudio to access R
, which makes things considerably more intuitive and generally easier to work with.
What’s more, both R
and the RStudio Desktop
version that we will be using are free and open source (see Section 1.2), which means that they are accessible to all, regardless of their institutional affiliation or professional status. This is in contrast to proprietary statistical software such as SPSS, for which you or your university needs to buy an expensive license. To get started in R
, all you will need is access to the internet, a computer (unfortunately, a tablet will not suffice), and the intrinsic motivation to work your way through the basic skills taught in this textbook.
[U]sing R - it’s like the green and environment-friendly gardening alternative to buying plastic wrapped tomatoes in the supermarket that have no taste anyway. (Martin Schweinberger 2022)
Last but not least, in choosing to learn R
, you are entering a vibrant community of users. As an open-source programming environment, R
is the product of many different people’s contributions. Everyday, new packages, functions, and resources are being developed, improved, and shared with the community. Given that R
has evolved into one of the most popular languages for scientific programming (and has become “the de facto standard in the language sciences” Winter 2019: xiii), many of these have been created by scientists and are particularly well-suited to research workflows. Moreover, the R
community is known for being welcoming, supportive, and inclusive (sadly, the same cannot be said of all communities in the computing world). This is reflected in the strong presence of many community-led initiatives such as RLadies and RainbowR, which encourage under-represented groups to participate in and contribute to the R
community. 🤗
“Look, I am studying languages so why should I learn to code?” 🤔
Using scripts rather than GUI software will help you make your research less error-prone, more transparent, and sustainable. Being open-source, there are no restrictions as to who can run R
code and older versions are available ensuring that exact reproduction is possible, even years later. As many other language scientists use R
, you will be able to collaborate with others and understand other researchers’ R
code. As we will see in a future chapter, in RStudio, it is also very easy to export R
code and share your scripts, for example as part of an appendix to your research publication, in various formats (including .html
that can be opened in any browser and .pdf
).
In addition, learning to code in R
is an excellent way to understand the basics of data literacy and statistical reasoning. These are skills that are highly valued among employers, both in academia and the industry. Many companies, public institutions (e.g., ministries, hospitals, national agencies) and NGOs hire data scientists who often work in R
. And, even if you end up doing little to no coding yourself, understanding the basic principles of programming is undoubtedly a highly useful skill in the modern world.
Some of you may be wondering whether you should be learning Python
rather than R
. Both are widely used languages in scientific programming and data science. At the time of writing, there are more resources specifically aimed at linguists and education researchers in R
than there are in Python simply because it is currently the most widely used language in these disciplines. Should you wish to learn Python
at a later stage, many of the same principles that you will have learned in this textbook will apply: it should feel somewhat like learning Italian when you already speak Spanish or French.
4.2 Installing R and RStudio
4.2.1 What are R
and RStudio? And why do I need both?
As a beginner, it’s easy to confuse R
and RStudio, but it’s important to understand that they are two very different things. R
is a programming environment for statistical computing and graphics that uses the programming language R
. Think of it as the engine with which we will learn to perform lots of different tasks. RStudio, by contrast, is a set of tools, a so-called ‘integrated development environment’ (IDE). It makes working in R
much more intuitive and efficient. If R
is the engine of our car, you can imagine RStudio as our dashboard. Hence, even though we will later on appear to only be working in RStudio, R
will actually be doing the heavy-lifting, under the hood.
R
At the time of writing, RStudio is the most widely used Integrated Development Environment (IDE) to work in R
. However, it is worth noting that many other IDEs that can be used to access R
. These include:
Whilst this textbook will assume that everyone is working in RStudio, if you are already familiar with another IDE that works well with R
, you are welcome to continue working in that IDE. Each IDE has a different feel to it and offers different functions so, ultimately, it’ll be up to you to find the one that suits you best!
4.2.2 Installing R
Go to the website of the Comprehensive R Archive Network (CRAN): https://cran.r-project.org.
Click on the “Download R for …” link that matches your operating system (Linux, macOS or Windows), then:
- For Windows, click on the top ‘base’ link, also marked as “install R for the first time” (Note that you should also use this link if you are updating your R version). On the next page, click on the top “Download R” link.
- For MacOS, click on either the top
.pkg
link if you have an Apple silicon Mac (e.g., M1, M2, M3) or the second.pkg
link, if you have an older Intel Mac. - For Linux, click on your Linux distribution and then follow the instructions on the following pages.
Once you have downloaded one of these
R
versions, navigate to the folder where you have saved it (by default, this will be your Downloads folder), and double click on the executable file to installR
.Follow the on-screen instructions to install
R
.Test that
R
is correctly installed. On Windows and MacOS, navigate to your Applications folder and double click on theR
icon. On Linux, open upR
by typingR
in your terminal. This should open up an R Console. You can type R commands into the Console after the command prompt>
. Type the following R code after the command prompt and then press enter:plot(1:10)
.
✅ If you see the plot above, you have successfully installed and tested R
and you can go on to installing RStudio.
⚠️ If that’s not the case, make a note of the errors produced (copy and paste them into a text document or take a screenshot) and search for solutions on the internet. It is very likely that many other people have already encountered the same problem as you and that someone from the R
community has posted a solution online.
The aim of this chapter is to install both R
and R Studio
on your own computer so that you can write and run your own scripts locally (i.e., on your own computer without the need for an internet connection). In some cases, however, this might not be possible. For example, because the programmes are not available for your operating system, or because you do not have admin rights on your computer, or because your disk is full and you cannot delete anything. None of these situations are ideal to do research, but don’t give up on learning R
: there is an alternative!
You can sign up to Posit Cloud. Posit Cloud will allow you to run R
in RStudio in a browser (e.g., Firefox or Chrome) without having to install anything on your computer. Although Posit Cloud’s free plan is limited, it will suffice to learn the contents of this textbook. You will be able to follow the textbook in exactly the same way as everyone else. However, you will need a stable internet connection and you may find that you need to be a bit more patient as things are likely to run a little slower. If you decide to opt for the Posit Cloud solution, create a free account and then go straight to Setting up RStudio.
4.2.3 Installing RStudio
When you head over to their website, it may be confusing to you that the company that provides RStudio, Posit, also offers paid-for versions of RStudio and other paying services. Do not worry, we will not need any of these: These are products designed for companies and large organisations. The version of RStudio Desktop
that we will be using, however, is completely free and, given that it is open source, even if Posit decided to stop working on this product one day, others in the R
community would take over. Such is the beauty of open-source software! 🤗
Head over to this page https://posit.co/download/rstudio-desktop/ to download the latest version of
RStudio Desktop
.As you have already installed
R
, you can jump straight to the section entitled “2: Install RStudio”. The website should have detected which operating system your computer is running on, so that you can most likely simply click on the “Download RStudio Desktop…” button. Your download should start straight away.- If an incorrect operating system is detected, simply scroll down the page to find your operating system and download the corresponding version of
RStudio.
- If an incorrect operating system is detected, simply scroll down the page to find your operating system and download the corresponding version of
Once you have downloaded RStudio, navigate to the folder where the downloaded file has been saved (by default, this will be your Downloads folder), and double-click on the executable file to install RStudio.
Follow the on-screen instructions to install RStudio.
If you run into any issues that you cannot solve with existing online posts, the Posit Community forums are a good place to ask for help.
4.3 Setting up RStudio
From now on, we will only be accessing R
through RStudio. When you open up RStudio for the first time, you might find the layout rather intimidating. The application window is divided into several sections, which we call ‘panes’. Each pane also has several tabs. Although it may seem overwhelming at first, you will soon see that these different panes and tabs will actually make life much easier.
4.3.1 Global options
Before we get staRted properly, however, we need to change some of the default settings of RStudio. The first set of changes that we are going to make ensure that, each time we launch a new R
session in RStudio, we start afresh.
To do so, head over to the ‘Tools’ drop-down menu and click on ‘Global Options’. Make sure that the first three boxes are unticked (see Figure 4.5 (a)). Under “Save workspace to .RData on exit”, select the option “Never”. Always starting afresh is good programming practice. It avoids any problems being carried over from previous R
sessions. You can think of it like cooking in a freshly cleaned, tidy kitchen. It’s much safer than preparing a meal in a messy, possibly even contaminated kitchen!
Ctrl/Command + ,
Next, under the ‘Global Options’ tab ‘Code’ of the ‘Global Options’ window, ensure that the fourth option “Use native pipe operator” is ticked (see Figure 4.5 (b)). This is a new feature in R
that is very useful so we will make use of it in this textbook. The other options are not relevant for now.
Finally, head over to the ‘Pane Layout’ tab. From here, you can rearrange the panes of your RStudio window. To do so, click on the ﹀ symbols to get a drop-down menu corresponding to each pane. You can also select which tabs you would like to see in each pane. If you are already familiar with RStudio, feel free to stick to your favourite set-up. Personally, I use the panes layout below and, if you are new to R
, I recommend that you select this layout, too. You can always go back to these settings to change this setup at any stage. Don’t forget to click on ‘OK’ at the bottom of the ‘Global Options’ page to save your settings. Then, the panes in your RStudio window should be ordered as in Figure 4.6 (b).
4.3.2 Testing RStudio
It is now time we tested whether RStudio is communicating well with R
. To do so, let’s run the same test as in the R Console
. This time, head over to the ‘Console’ tab in the top right pane of your RStudio window and, after the command prompt >
, type: plot(1:10)
and then press enter. You should see the same plot as earlier on (see Figure 4.4 (b)), appearing in the Plots tab of the bottom-right pane of your RStudio window.
If you get the following error message Error in plot.new() : figure margins too large
, this is because your bottom-right pane is hidden from view or too small for the plot to be printed there. Click on the small two-window icon in the bottom-right corner if it is hidden (see Figure 4.7 (a)). Or, if it is too small, click on the dividing line between the two right-hand side panes and, whilst still holding down the mouse button, drag up the line until it is about halfway up. Then, re-type the command plot(1:10)
in the Console pane and press enter again. The plot should appear as in Figure 4.7 (b).
4.4 Installing R
packages
4.4.1 What are packages?
You now have a base installation of R
. Base R
is very powerful and comes with many standard packages and functions that R
users use on a daily basis. If you click on the Packages tab in the bottom-right pane and scroll down, you will see that there are many packages available. Only a few are selected. These are part of the base R
installation.
You can think of base R
as a fully functional student kitchen. It is rather small and only has the most essential ingredients and equipment, but it still has everything you need to cook simple, delicious meals. Downloading and installing additional packages is like buying fancier ingredients (i.e., packages with datasets) or more sophisticated and specialised kitchen devices (i.e., packages that provide additional functions).
In addition to the members of the R Core Team who develop and maintain base R
, thousands of R
users develop and share additional R
packages every day. These enable us to vastly increase the capacities of base R
. Packages are a very helpful way to bundle together a set of functions, data, and documentation files so that other R
users can easily download these bundles and add them to their local R
installation.
Throughout this textbook, the names of packages will be enclosed in curly brackets like this: {ggplot2}.
Q4.1 Which of these packages is not part of base R
?
Q4.2 Is it possible to create an R
package that provides access to the full texts of all of Jane Austen’s published novels for computational text analysis in R
?
Q4.3 Is the {janeaustenr} package installed as part of base R
?
4.4.2 Installing packages
To install a package, you will first need to download it from the internet. Packages are typically stored on different websites (online repositories), but the most trustworthy one and easiest to work with is CRAN (Comprehensive R Archive Network). To install the {janeaustenr} package from CRAN, simply type the following command in the Console pane and then type enter: install.packages("janeaustenr")
.
This command will take a few seconds to run (or longer depending on how slow your internet connection is). You should then see a message in red in the Console indicating (among other things that you can ignore) that the package has been successfully downloaded and its size (here: 1.5 megabyte), as well as the path to where the package’s content has been saved on your computer (see Figure 4.8). You do not need to worry about any of the other information.
To check that the package has been successfully downloaded and installed, head over to the Packages tab of the bottom-right pane and scroll down to the {janeaustenr} package, or search for it using the search window within this same tab. The {janeaustenr} package should now be visible, which tells us that the package is installed on your computer. Note, however, that the checkbox next to it is currently empty. This means that the package hasn’t been loaded in our current R
session and therefore cannot be used yet.
There are other ways to install packages, e.g., from Bioconductor and GitHub.
To find out more, read Section 1.5 from Douglas et al. (2024), which is available as an Open Educational Resource (see Chapter 1).
4.4.3 Loading packages
If you want to use the fancy ingredient or new kitchen device that was delivered in the package that you installed, you first need to get it out of the fridge or the cupboard and place it on your kitchen counter. This is the equivalent of “loading” a package. The command to load a package is library()
. This is because, rather confusingly, once they are unpacked (i.e., installed), packages are usually referred to as libraries.
To load the {janeaustenr} package, enter the following command in the Console:
library(janeaustenr)
If you correctly installed the package and have not misspelt the command, it may look like nothing has happened, as the Console returns nothing (see first red ellipse on Figure 4.9). However, if you go back to your Packages tab and scroll down to the {janeaustenr} package, you will see that the box next to it is now ticked (see second ellipse on Figure 4.9). This means that the package is loaded and ready to be used.
Note that whilst you only need to install each package once, you will need to load it every time we want to use it in a new R
session. This is because, when we close R
and start a new R
session, our kitchen is perfectly clean and tidy and everything is back in storage. And the good news is that we don’t even need to do the washing-up! 🙃
4.5 Package documentation
To find out more about any package or function, simply use the command help()
or its shortcut ?
. For example, to find out more about the {janeaustenr} package, enter the command help(janeaustenr)
or ?janeaustenr
in the Console. The help file will open up in the Help tab of the bottom-right pane. It contains the name of the package and a short description, as well as the name of the package maintainer, Julia Silge, and some additional links.
One of these links takes us to the package creator’s GitHub repository. This is where we can find a source code for the package, should we want to check how it works under the hood, or amend it in any way. Click on this link and scroll down the package’s GitHub page to consult its README file. This document informs us that the package includes plain text versions of Jane Austen’s six completed, published novels and tells us under what name they are stored within the library. For example, to access Pride and Prejudice, we need to load the library object prideprejudice
.
R
cannot contain spaces or hyphens.Pick your favourite Jane Austen novel and enter its corresponding object name in the Console, e.g., emma
. The entire novel will be printed in the Console output! You can print only a few lines by selecting them within square brackets, e.g., the command emma[20:25]
will only print lines 20 to 25 of the object emma
(see Figure 4.10).
emma
(note that you can adjust the size of the Console pane to see more or less of the text at any one time).
To find out more about a dataset or function within a package, use the functions help()
or ?
, e.g., help(emma)
or ?emma
. In this case, the help file provides us with a short description of this object and a link to the original source from which the package creator obtained the novel (which is in the public domain, otherwise it would not be possible to share it in this way).
Q4.4 Which is the correct R
object name to access Jane Austern’s novel ‘Sense and Sensibility’?
Q4.5 What is the source for the R
object containing Jane Austern’s novel ‘Sense and Sensibility’?
Q4.6 What is the first word of the 66th line in the R
object containing Jane Austern’s novel ‘Sense and Sensibility’?
4.6 Citing R
packages
When we use a package that is not part of base R
, it is very important to reference the package properly. There are two main reasons for doing this. For a start, the people who create and maintain these packages largely do so in their free time and they deserve full credit for their incredibly valuable work and contribution to science. Hence, whenever you use a package for your research, you should cite it, just like you would other sources.
The help page of the {janeaustenr} package already informed us that the maintainer of the package is Julia Silge. To get a full citation, however, we should use the citation()
function. Enter citation("janeaustenr")
in the Console to find out how to cite this package.
Note that the recommended bibliographic reference also includes the package version, which is important for reproducibility as the package may evolve and someone wanting to reproduce your analysis (and this may well be future you!) will need to know which version you used. This is the second main reason why we should be diligent about citing the packages that we used. In a research report, thesis, or academic article, you could cite the {janeaustenr} package like this:
We used the janeaustenr package (Silge 2022) to access Jane Austen’s six published novels in R (R Core Team 2024).
You can see the full references by hovering on the in-text citation links or by going to the References section of this book.
You may also want to install the {report} package, which includes a number of useful functions for citing R
versions and R
packages:
::report_system() report
Analyses were conducted using the R Statistical language (version 4.4.1; R Core
Team, 2024) on macOS Sonoma 14.5
::cite_packages() report
- Makowski D, Lüdecke D, Patil I, Thériault R, Ben-Shachar M, Wiernik B (2023). "Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption." _CRAN_. <https://easystats.github.io/report/>.
- Moroz G (2020). _Create check-fields and check-boxes with checkdown_. <https://CRAN.R-project.org/package=checkdown>.
- R Core Team (2024). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>.
- Silge J (2022). _janeaustenr: Jane Austen's Complete Novels_. R package version 1.0.0, <https://CRAN.R-project.org/package=janeaustenr>.
- Xie Y (2024). _knitr: A General-Purpose Package for Dynamic Report Generation in R_. R package version 1.47, <https://yihui.org/knitr/>. Xie Y (2015). _Dynamic Documents with R and knitr_, 2nd edition. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 978-1498716963, <https://yihui.org/knitr/>. Xie Y (2014). "knitr: A Comprehensive Tool for Reproducible Research in R." In Stodden V, Leisch F, Peng RD (eds.), _Implementing Reproducible Computational Research_. Chapman and Hall/CRC. ISBN 978-1466561595.
::report_packages() report
- report (version 0.5.9; Makowski D et al., 2023)
- checkdown (version 0.0.12; Moroz G, 2020)
- R (version 4.4.1; R Core Team, 2024)
- janeaustenr (version 1.0.0; Silge J, 2022)
- knitr (version 1.47; Xie Y, 2024)
To find out more, it is also worth reading Steffi LaZerte’s blog post on “How to cite R and R packages”: https://ropensci.org/blog/2021/11/16/how-to-cite-r-and-r-packages/.
4.7 Keeping things up to date ✨
As with all software, it is a good idea to keep your installations of RStudio and R
up-to-date. New features are constantly being added, bugs are fixed and updates may include important security patches.
4.7.1 Updating RStudio
By default, RStudio will let you know when a new version is available in a pop-up window. To update RStudio simply follow the same instructions as for the first installation (see Section 4.2.3). This time, however, when you add RStudio to your apps, you will get a pop-up message warning you that an older version of this programme already exists on your computer (see Figure 4.11). You can safely click on the option “Replace”. All of your previous Global Options settings will be transferred to your updated RStudio version so this should be a quick-and-easy process.
You can also check which version of RStudio you are running by clicking on the “Help” menu in RStudio’s top toolbar and then selecting the option “About RStudio”. In the “Help” drop-down menu, you also have an option to “Check for Updates”.
4.7.2 Updating R
Updating R
is a little more complex because you will also need to update all of your R
packages. Some of the packages that you use may not (yet) be available for the latest R
version. This is why, for beginners, I do not recommend updating R
in the middle of a project. That said, it is a good idea to keep your R
version up-to-date. To find out which version of R
you are currently working with, run this command in the Console.
R.version.string
[1] "R version 4.4.1 (2024-06-14)"
Compare this version number with the number of the latest version available on CRAN (see Figure 4.12). If the version that you are running is not the same as the latest R
version available on CRAN, you might want to update it. As a rule of thumb, it is a good idea to do an update if your version is more than six months old. To proceed with the update, close RStudio on your computer. Then, follow the same instructions as for the first-time installation of R
(see Section 4.2.2).
4.7.3 Updating R
packages
Once you have updated R
, it is important that you also update your installed packages. To do so, run the following command in the Console:
update.packages(ask = FALSE, checkBuilt = TRUE)
Alternatively, you can also go to the Packages tab of RStudio and click on the button “Update”. A pop-up window will appear with a list of the packages that need updating. Click on “Select All” and “Install Updates”.
Note that, if you have installed a lot of packages, this updating operation could take a while. It requires a stable internet connection and a bit of patience. 🧘🏾
R
using {installr} (for Windows only)
The {installr} package simplifies updating R
on Windows. To install the package use the usual commands:
install.packages("installr") # Run this command the first time you use the package.
library(installr) # Run this command everytime you want to update R using this package.
Then, run the updateR()
function, which automates the updating process by detecting your current R
version, comparing it with the latest available version, and guiding you through the process of downloading and installing the latest version.
It is also possible to customise the update process with arguments like updateR(update_packages = FALSE)
to skip package updates. For more details, check the documentation using the command ?updateR
.
Check your progress 🌟
You have successfully completed 0 out of 6 questions in this chapter.
Are you confident that you can…?
Once you have successfully completed all the steps outlined in this chapter, you are ready to get staR
ted with Chapter 5, which provides a hands-on introduction to R
in RStudio.