The columns of interest of the data look like this
wheels
name country
1 The Southern Star Australia
2 Wheel of Brisbane Australia
3 Wheel of Perth Australia
4 Wiener Riesenrad Austria
5 Niagara SkyWheel Canada
6 Beijing Great Wheel China
7 Changsha Ferris Wheel China
How to color each country according to the number of rows on the following map ?
library(tidyverse); library(maptools); library(sf)
# Main data
wheels <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-09/wheels.csv')
# Data for the map
world <- map_data("world")
data(wrld_simpl)
countries <- st_as_sf(wrld_simpl)
# Plot
df_plot <- right_join(wheels, countries, by = c("country" = "NAME"))
ggplot(df_plot, aes(LON, LAT))
geom_map(data = world, map = world, aes(long, lat, map_id = region), color = "white", fill = "#66BBBB", size = 0.1)
Do I have to summarise the data using dplyr::count
for example ?
EDIT : I made a dataframe giving the number of rows per country in the wheels
dataframe
# Number of rows per country
data_count <- wheels %>% count(country)
# Joining the dataframes
df_plot <- right_join(wheels, countries, by = c("country" = "NAME")) %>%
left_join(data_count)
CodePudding user response:
You can summarize the wheels
data frame by using group_by
and count
. Then left join this to your countries data frame:
df_plot <- left_join(countries, wheels %>% group_by(country) %>% count(),
by = c("NAME" = "country"))
ggplot(df_plot)
geom_sf(aes(fill = n))
scale_fill_viridis_c(na.value = "#000035")
The problem here is that your left_join
requires an exact match between the name of the country stored in wheels$country
and countries$NAME
, but there are some mismatches:
unique(wheels$country)[is.na(match(unique(wheels$country), countries$NAME))]
#> [1] "USA" "Tailand" "UK" "UAE"
#> [5] "Phillippines" "Iran" "S Korea" "Dubai"
So a more complete solution would be:
left_join(countries,
wheels %>%
mutate(country = case_when(
country == "UK" ~ "United Kingdom",
country == "USA" ~ "United States",
country == "Tailand" ~ "Thailand",
country == "UAE" ~ "United Arab Emirates",
country == "Dubai" ~ "United Arab Emirates",
country == "S Korea" ~ "Korea, Republic of",
country == "Phillippines" ~ "Philippines",
country == "Iran" ~ "Iran (Islamic Republic of)",
TRUE ~ country)) %>%
count(country),
by = c("NAME" = "country")) %>%
ggplot()
geom_sf(aes(fill = n))
scale_fill_viridis_c(na.value = "#000040")