Load required packages:
suppressMessages(library(tidyverse))
suppressMessages(library(gapminder))
suppressMessages(library(DT))
Choose EITHER “Univariate Option 1” or “Univariate Option 2”. Both of these problems have three components:
You are expected to use pivot_wider() and pivot_longer() for reshaping, and ggplot2 for the plot.
Regarding the plot:
widened <- gapminder %>%
pivot_wider(id_cols = year, names_from = "country", values_from = lifeExp)
tibble, as there are only 12 years in the gapminder data set.widened_plot <- widened %>%
ggplot(aes(Canada, China)) +
geom_point(aes(color = as.factor(year))) +
ggtitle("Life Expectancy from 1952 to 2007") +
ylab("Life Expectancy in China (years)") +
xlab("Life Expectancy in Canada (years)") +
scale_color_discrete("Year") +
theme(text = element_text(size=18))
lengthened <- widened %>%
pivot_longer(cols = -year, names_to = "country", values_to = "lifeExp")
tibble will only contain year, country, lifeExp (since the other columns continent, gdpPercap and pop were dropped during the original pivot_wider()), with each combination of country and year in each row.Choose EITHER “Multivariate Option 1” or “Multivariate Option 2”. All of these problems have two components:
Don’t worry about producing a plot here. You are expected to use pivot_wider() and pivot_longer() for reshaping.
widened <- gapminder %>%
pivot_wider(id_cols = year,
names_from = country,
names_sep = "_",
values_from = c("lifeExp", "gdpPercap"))
tibble contains a column for lifeExp and gdpPercap of all 142 countries for each row, meaning a total of 285 columns.tibble has 12 rows, as that is the number of unique years in the gapminder dataset.lengthened <- widened %>%
pivot_longer(cols = -year,
names_to = c(".value", "country"),
names_sep = "_")
tibble below will only contain year, country, lifeExp and gdpPercap (since the other columns continent and pop were dropped during the original pivot_wider()), with each combination of country and year in each row.Do ALL of the activities in this section.
Read in the made-up wedding guestlist and email addresses using the following lines (go ahead and copy-paste these):
guest <- suppressMessages(
read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/wedding/attend.csv")
)
email <- suppressMessages(
read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/wedding/emails.csv")
)
Then, complete the following tasks using the tidyverse (tidyr, dplyr, …). No need to do any pivoting – feel free to leave guest in its current format.
guest tibble), add a column for email address, which can be found in the email tibble.email tibble must be separated by guest as opposed to by party, using separate_rows().email must be renamed to name, as it appears in the guest tibble, using rename().email_sep <- email %>%
separate_rows(email, guest, sep = ", ") %>%
rename(name = guest)
left_join().guest_email <- left_join(guest,
email_sep,
by = "name"
)
anti_join(), guests in email_sep who are also in guest will be dropped.email_not_guest <- anti_join(email_sep,guest, by = "name")
email tibble was processed to be separated by name instead of party, it is possible that other members of the party are on the guest list but these three are not.guest list, and all guests in email_sep, by using full_join().all <- full_join(guest, email_sep, by = "name")
tibble.NA values for almost every column except for name and email.