Data Cleaning and Processing
University of Sheffield
22 October 2025
Looking back: so far, we’ve learned several things …
Now, let’s move on to learn:
filter(), arrange(), select(), mutate(), summarise().dplyr gives concise verbs for common tasks:
filter() — pick rowsarrange() — reorder rowsselect() — pick columnsmutate() — create/transform columnssummarise() — collapse to summaries \(\rightarrow\) often must be preceded by group_by()We’ll use the built-in mpg dataset for examples.
Make sure you know some of useful data inspection functions.
Read more about mpg dataset here.
filter() - Select Observations/RowsSelect certain manufacturer, displ, and year?
arrange() — Reorder/Sort RowsWhat is the base R counterpart for arrange()?
Note
The mutate function doesn’t change the original data, meaning the new column is only available to functions that are piped after the mutate.
Recommended reading material: http://r4ds.had.co.nz/relational-data.html


We have flights2 and airlines table.
We want to have the full name of the airline in the flights2 table rather than the carrier abbreviation.
How to achieve this?

Or keep NA and exclude 9999:
Missing values — simple imputation example.
Now it’s your turn to dive into the worksheet.
