Load required packages:
suppressMessages(library(tidyverse))
suppressMessages(library(gapminder))
suppressMessages(library(DT))
Choose EITHER “Univariate Option 1” or “Univariate Option 2”. Both of these problems have three components:
You are expected to use pivot_wider()
and pivot_longer()
for reshaping, and ggplot2 for the plot.
Regarding the plot:
widened <- gapminder %>%
pivot_wider(id_cols = year, names_from = "country", values_from = lifeExp)
tibble
, as there are only 12 years in the gapminder data set.widened_plot <- widened %>%
ggplot(aes(Canada, China)) +
geom_point(aes(color = as.factor(year))) +
ggtitle("Life Expectancy from 1952 to 2007") +
ylab("Life Expectancy in China (years)") +
xlab("Life Expectancy in Canada (years)") +
scale_color_discrete("Year") +
theme(text = element_text(size=18))
lengthened <- widened %>%
pivot_longer(cols = -year, names_to = "country", values_to = "lifeExp")
tibble
will only contain year
, country
, lifeExp
(since the other columns continent
, gdpPercap
and pop
were dropped during the original pivot_wider()
), with each combination of country
and year
in each row.Choose EITHER “Multivariate Option 1” or “Multivariate Option 2”. All of these problems have two components:
Don’t worry about producing a plot here. You are expected to use pivot_wider()
and pivot_longer()
for reshaping.
widened <- gapminder %>%
pivot_wider(id_cols = year,
names_from = country,
names_sep = "_",
values_from = c("lifeExp", "gdpPercap"))
tibble
contains a column for lifeExp
and gdpPercap
of all 142 countries for each row, meaning a total of 285 columns.tibble
has 12 rows, as that is the number of unique years in the gapminder
dataset.lengthened <- widened %>%
pivot_longer(cols = -year,
names_to = c(".value", "country"),
names_sep = "_")
tibble
below will only contain year
, country
, lifeExp
and gdpPercap
(since the other columns continent
and pop
were dropped during the original pivot_wider()
), with each combination of country
and year
in each row.Do ALL of the activities in this section.
Read in the made-up wedding guestlist and email addresses using the following lines (go ahead and copy-paste these):
guest <- suppressMessages(
read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/wedding/attend.csv")
)
email <- suppressMessages(
read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/wedding/emails.csv")
)
Then, complete the following tasks using the tidyverse
(tidyr
, dplyr
, …). No need to do any pivoting – feel free to leave guest
in its current format.
guest
tibble), add a column for email address, which can be found in the email
tibble.email
tibble must be separated by guest as opposed to by party, using separate_rows()
.email
must be renamed to name
, as it appears in the guest
tibble, using rename()
.email_sep <- email %>%
separate_rows(email, guest, sep = ", ") %>%
rename(name = guest)
left_join()
.guest_email <- left_join(guest,
email_sep,
by = "name"
)
anti_join()
, guests in email_sep
who are also in guest
will be dropped.email_not_guest <- anti_join(email_sep,guest, by = "name")
email
tibble was processed to be separated by name
instead of party, it is possible that other members of the party are on the guest list but these three are not.guest
list, and all guests in email_sep
, by using full_join()
.all <- full_join(guest, email_sep, by = "name")
tibble
.NA
values for almost every column except for name
and email
.