Home > front end >  How can I fix an issue with Lesotho filling the same color as South Africa in ggplot2, using geom_po
How can I fix an issue with Lesotho filling the same color as South Africa in ggplot2, using geom_po

Time:09-05

I am using ggplot2 to plot the number of internet service providers per country in Africa using a dataset obtained from ITU. I merged polygon coordinates from map_data for all African countries to my internet service provider dataframe as follows:

if (!require("pacman")) install.packages("pacman"); library(pacman)
p_load("pacman", "ggplot2", "tidyverse", "dplyr", "maptools", 
       "plyr", "geosphere", "maps")

#Reading in Stata dataset with data on internet service providers per country
structure(list(region = structure(c("Algeria", "Angola", "Benin", 
"Botswana", "Burkina Faso", "Burundi", "Cabo Verde", "Cameroon", 
"Central African Republic", "Chad", "Comoros", "Republic of Congo", 
"Ivory Coast", "Democratic Republic of the Congo", "Djibouti", 
"Egypt", "Equatorial Guinea", "Eritrea", "Swaziland", "Ethiopia", 
"Gabon", "Gambia", "Ghana", "Guinea", "Guinea-Bissau", "Kenya", 
"Lesotho", "Liberia", "Libya", "Madagascar", "Malawi", "Mali", 
"Mauritania", "Mauritius", "Morocco", "Mozambique", "Namibia", 
"Niger", "Nigeria", "Rwanda", "Sao Tome and Principe", "Senegal", 
"Seychelles", "Sierra Leone", "Somalia", "South Africa", "South Sudan", 
"Sudan", "Tanzania", "Togo", "Tunisia", "Uganda", "Zambia", "Zimbabwe", 
"Western Sahara"), format.stata = "2s"), number_internet_providers = structure(c("29", 
"92", "15", "94", "14", "9", "3", "15", "", "", "", "12", "4", 
"24", "1", "6", "14", "5", "15", "1", "25", "8", "18", "5", "3", 
"390", "4", "8", "72", "4", "19", "6", "3", "15", "3", "57", 
"28", "8", "100", "24", "2", "6", "4", "17", "15", "1400", "24", 
"9", "7", "6", "8", "25", "23", "10", "3"), format.stata = "%9s")), row.names = c(NA, 
-55L), class = c("tbl_df", "tbl", "data.frame"))

#Creating map of African continent with countries as polygons
mapdata <- map_data("world", region = c("Morocco", "Algeria", "Tunisia", "Libya",
                                        "Egypt", "Mauritania", "Mali", "Burkina Faso",
                                        "Niger", "Chad", "Sudan", "Ethiopia", "Eritrea",
                                        "Djibouti", "South Sudan", "Somalia", "Senegal",
                                        "Cape Verde", "Gambia", "Guinea", "Guinea-Bissau",
                                        "Liberia", "Sierra Leone", "Ivory Coast", "Ghana",
                                        "Togo", "Benin", "Nigeria", "Cameroon", 
                                        "Central African Republic", "Gabon", "Equatorial Guinea",
                                        "Sao Tome and Principe", "Uganda", "Republic of Congo", 
                                        "Democratic Republic of the Congo", "Rwanda", "Burundi",
                                        "Kenya", "Tanzania", "Angola", "Mozambique", "Malawi", 
                                        "Mauritius", "Seychelles", "Comoros", "Madagascar", 
                                        "Botswana", "Zimbabwe", "Zambia", "Namibia", "Swaziland", 
                                        "Lesotho", "South Africa", "Western Sahara"))

#Calculating centroids of each polygon/country for data labels on map, then merging with the original dataframe                                
cnames <- aggregate(cbind(Longitude, Latitude) ~ region, data=fin_map, FUN=mean)
cnames3 <- merge(cnames, net, by.x="region")

#Merging latitude and longitude from map_data to the dataframe
fin_map2 <- merge(mapdata, cnames3, by.x="region")

#Converting the variable to be plotted into numeric
fin_map2$number_internet_providers <- as.numeric(fin_map2$number_internet_providers)

##Creating map
ggplot(fin_map2, aes(x=long, y=lat))  
  geom_polygon(aes(group = group, fill = number_internet_providers), colour = "white")  
  geom_text(aes(label = number_internet_providers, x = Longitude, y = Latitude))   
  labs(title="Number of Internet Providers",subtitle="Number of internet service providers in country", caption="Source = International Telecommunication Union (ITU)")  
  scale_fill_distiller(palette = "Blues", direction = 1)  
  labs(fill='# of Providers')   
  theme(legend.position="right",
        plot.title = element_text(size=24, angle=0, hjust=0.5, face="bold"),
        plot.subtitle = element_text(size=20, angle=0, hjust=0.5,vjust=0),
        plot.caption = element_text(size=12, angle=0, hjust=1, vjust=.5),
        legend.title = element_text(size=16, colour="black", face="bold"),
        legend.text = element_text(colour="black", size=14, face="bold"),
        legend.key.size = unit(1, 'cm'),
        strip.text.x = element_text(size = 14),
        axis.title.x = element_text(size=13, angle=0, hjust=.5, vjust=0, face="bold"),
        axis.title.y = element_text(size=13, angle=90, hjust=.5, vjust=.5, face="bold"),
        axis.text.x = element_text(size=12, angle=0, hjust=.5, vjust=0, face="bold"),
        axis.text.y = element_text(size=12, angle=90, hjust=.5, vjust=.5, face="bold")) 
  coord_map(projection = "mercator",
            xlim = c(-26,64), ylim = c(-35,38))

ggsave(file="Number_providers.png", scale = 2, height=4, width=5, units="in", dpi=150)

The color for Lesotho (which falls entirely inside of South Africa) is the same as that of South Africa, which is clearly incorrect, based on the number of internet service providers (4 in Lesotho vs. 1400 in South Africa). The results can be seen in the image below.

Map of # of internet service providers per country in Africa

I'm wondering how to fix this issue so that Lesotho is filled in accordance with its respective data, instead of the data from South Africa. Please let me know if I am missing any information that would be useful in answering this question, as this is my first time posting a question to Stack Overflow.

CodePudding user response:

There's a note in the geom_polygon documentation:

From R 3.6 and onwards it is possible to draw polygons with holes by providing a subgroup aesthetic that differentiates the outer ring points from those describing holes in the polygon.

Using that hint, and simplifying your example so it is self-contained, I managed to get the plot below. There are a lot of changes to the code, but the important one for you is the aes() argument to geom_polygon().

set.seed(123)
regions <- c("Morocco", "Algeria", "Tunisia", "Libya",
             "Egypt", "Mauritania", "Mali", "Burkina Faso",
             "Niger", "Chad", "Sudan", "Ethiopia", "Eritrea",
             "Djibouti", "South Sudan", "Somalia", "Senegal",
             "Cape Verde", "Gambia", "Guinea", "Guinea-Bissau",
             "Liberia", "Sierra Leone", "Ivory Coast", "Ghana",
             "Togo", "Benin", "Nigeria", "Cameroon", 
             "Central African Republic", "Gabon", "Equatorial Guinea",
             "Sao Tome and Principe", "Uganda", "Republic of Congo", 
             "Democratic Republic of the Congo", "Rwanda", "Burundi",
             "Kenya", "Tanzania", "Angola", "Mozambique", "Malawi", 
             "Mauritius", "Seychelles", "Comoros", "Madagascar", 
             "Botswana", "Zimbabwe", "Zambia", "Namibia", "Swaziland", 
             "South Africa", "Lesotho", "Western Sahara")

net <- data.frame(region = regions, number_internet_providers = sample(length(regions)))

library(ggplot2)

mapdata <- map_data("world", region = regions)

fin_map2 <- merge(mapdata, net, by.x="region")

ggplot(fin_map2, aes(x=long, y=lat))  
  geom_polygon(aes(group = region, subgroup = group, fill = number_internet_providers), colour = "white")  
  labs(title="Number of Internet Providers",subtitle="Number of internet service providers in country", caption="Source = Fake random data!")  
  scale_fill_distiller(palette = "Blues", direction = 1)  
  labs(fill='# of Providers')   
  coord_map(projection = "mercator",
            xlim = c(-26,64), ylim = c(-35,38))

Created on 2022-09-04 with reprex v2.0.2

  • Related