Home > Software engineering >  How to color countries according to the count of rows on a map in R?
How to color countries according to the count of rows on a map in R?

Time:08-10

The columns of interest of the data look like this

wheels

  name                  country  
1 The Southern Star     Australia
2 Wheel of Brisbane     Australia
3 Wheel of Perth        Australia
4 Wiener Riesenrad      Austria  
5 Niagara SkyWheel      Canada   
6 Beijing Great Wheel   China    
7 Changsha Ferris Wheel China   

How to color each country according to the number of rows on the following map ?

library(tidyverse); library(maptools); library(sf)
# Main data
wheels <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-09/wheels.csv')

# Data for the map
world <- map_data("world")
data(wrld_simpl)
countries <- st_as_sf(wrld_simpl)

# Plot
df_plot <- right_join(wheels, countries, by = c("country" = "NAME"))
ggplot(df_plot, aes(LON, LAT))  
  geom_map(data = world, map = world, aes(long, lat, map_id = region), color = "white", fill = "#66BBBB", size = 0.1) 

Do I have to summarise the data using dplyr::count for example ?

EDIT : I made a dataframe giving the number of rows per country in the wheels dataframe

# Number of rows per country
data_count <- wheels %>% count(country)
# Joining the dataframes 
df_plot <- right_join(wheels, countries, by = c("country" = "NAME")) %>% 
  left_join(data_count)

enter image description here

CodePudding user response:

You can summarize the wheels data frame by using group_by and count. Then left join this to your countries data frame:

df_plot <- left_join(countries, wheels %>% group_by(country) %>% count(),
                     by = c("NAME" = "country"))

ggplot(df_plot)   
  geom_sf(aes(fill = n))  
  scale_fill_viridis_c(na.value = "#000035")

enter image description here

The problem here is that your left_join requires an exact match between the name of the country stored in wheels$country and countries$NAME, but there are some mismatches:

unique(wheels$country)[is.na(match(unique(wheels$country), countries$NAME))]
#> [1] "USA"          "Tailand"      "UK"           "UAE"         
#> [5] "Phillippines" "Iran"         "S Korea"      "Dubai" 

So a more complete solution would be:

left_join(countries,
          wheels %>%
            mutate(country = case_when(
               country == "UK"           ~ "United Kingdom",
               country == "USA"          ~ "United States",
               country == "Tailand"      ~ "Thailand",
               country == "UAE"          ~ "United Arab Emirates",
               country == "Dubai"        ~ "United Arab Emirates",
               country == "S Korea"      ~ "Korea, Republic of",
               country == "Phillippines" ~ "Philippines",
               country == "Iran"         ~ "Iran (Islamic Republic of)",
               TRUE ~ country)) %>%
            count(country),
           by = c("NAME" = "country")) %>%
  ggplot()   
  geom_sf(aes(fill = n))  
  scale_fill_viridis_c(na.value = "#000040")

enter image description here

  • Related