Week 3: R / RStudio practicals
R/RStudio Practical Part 1
This worksheet contains a sequence of short examples and exercises that demonstrate basic R workflows: loading datasets, exploring data, making simple visualisations, importing data from files and the web, and working with JSON.
Load the built-in “women” dataset that contains the height and weight of a sample of women and inspect it.
Explore the distribution of a continuous variable using a histogram.
View the help page for hist to see what arguments you can modify.
Modify the histogram (example with custom breaks, title and x label).
Display graphs side-by-side (set plotting area to 1 row and 2 columns), then show two histograms.
To display one graph at a time you can reset the plotting area (example command shown) and then plot the dataset.
Importing data into R / RStudio
View available datasets and load the tidyverse. Then read example TSV/CSV files. The examples below show different read_* options, including forcing no header and specifying column types.
Exercise
Using the visualisations you saw previously, explore the two datasets you have loaded (the free school meals and the HIV prevalence). For example, compare the distribution of HIV prevalence across different years using side-by-side histograms.
Read HTML and XML data
Install and load the rvest package, then read a Wikipedia page and extract headings, paragraphs, links, tables and images as examples of web scraping.
Read JSON data
Install and load jsonlite, parse a small JSON string, roundtrip to JSON, and demonstrate reading a remote JSON resource (Citibike station information). Then explore the list/dataframe structure and make plots from flattened data.
Exercise
Using the visualisations you have learned, show the distribution of available bikes and compare the number of available docks to the number of total docks (notice these are columns inside a dataframe that is inside a list).
Additional info
Check and increase how many elements R will print to the console.