Home > Software design >  How to subset with ggplot2 without removing the shapes?
How to subset with ggplot2 without removing the shapes?

Time:02-04

Hy guys, I'm working with ggplot2 and creating a geographic representation of my country. This is the dataset and the script I'm using ( prov2022 is the shapefile for the map)

#database
COD_REG   COD_PROV   Wage
1         91         530
1         92         520
1         93         410
2         97         300
2         98         205
2         99         501
13        102        700
13        103        800   
13        159        900
18        162        740
18        123        590   
18        119        420
19        162        340
19        123        290   
19        119        120
#script
right_join(prov2022, database, by = "COD_PROV") %>% 
  ggplot(aes(fill = `Wage`)) 
  geom_sf(data = ~ subset(., `Wage` > 300 & `Wage` <= 800))  
  theme_void()  
  theme(legend.title=element_blank()) 
  scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black'))  
  geom_blank()

It works fine, but I'm insterested in visualizing also the shapes of the areas that I've exclude with the command subset. My purpose was to fill with the color gradient only the regions with Wage > 300 & Wage <= 800, but setting geom_sf(data = ~ subset(., Wage > 300 & Wage <= 800)) I have removed completely the ones that do not satisfy this condition from my map. Actually, I need to have them in the output but whitout being filled (just their shapes).

How do I solve?

UPDATE ABOUT SCRIPT This is what I'm using after @r2evans' suggestion

right_join(prov2022, database, by = "COD_PROV") %>% 
  ggplot(aes(fill = `Importo medio mensile`)) 
  geom_sf(data = ~ transform(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 1500 & `Importo medio mensile` <= 1700, `Importo medio mensile`[NA], `Importo medio mensile`)))   
  theme_void()  
  theme(legend.title=element_blank()) 
  scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), na.value = "#00000000")  
  geom_blank()

but the answer is

Error in FUN(X[[i]], ...) : object 'Importo medio mensile' not found

UPDATE PART 2

If I want to fill using another variable Salario reale, but I want to maintaining the selection of the areas done with the values of the previous variable Importo medio mensile, what should I do?

Substituting only the fill variable doesn't work

right_join(prov2022, database, by = "COD_PROV") %>% 
  ggplot(aes(fill = `Salario Reale`)) 
  geom_sf(data = ~ dplyr::mutate(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 1500 & `Importo medio mensile` <= 1700, `Importo medio mensile`, `Importo medio mensile`[NA])))   
  theme_void()  
  theme(legend.title=element_blank()) 
  scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), na.value = "#00000000")  
  geom_blank()

it colors all the regions of my country as if the subset that I want to maintain (the one with the variable Importo medio mensile) weren't there. How can I solve?

UPDATE 3 The solution proposed by r2evans works!!

CodePudding user response:

Instead of filtering out the data, just replace (optionally inline) the not-to-be-colored values with NA.

Continuing from my ggplot2 with other regions values replaced with NA

(The use of val[NA] is to make sure we have the one specific class of NA, as there are at least 6 different types of NA.)

Granted, gray may not be what you want, so you can fix that with na.value= (its default is na.value="grey50").

ggplot(usa, aes(fill = val))  
  geom_sf(data = ~ transform(., val = ifelse(val < 0.5, val[NA], val)))   
  scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), 
                       na.value = "#00000000")   
  geom_blank()

same image but with non-colored states now with transparent color

where #00000000 is a fully-transparent color. The first six 0s don't matter, the trailing two 00 indicates an alpha of 0 (transparent).


I think this means you want something like this:

right_join(prov2022, database, by = "COD_PROV") %>% 
  ggplot(aes(fill = `Importo medio mensile`)) 
  geom_sf(data = ~ mutate(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 300 & `Importo medio mensile` <= 800, `Importo medio mensile`, `Importo medio mensile`[NA])))  
  theme_void()  
  theme(legend.title=element_blank()) 
  scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black'))  
  geom_blank()

Notes:

  • I updated from `Wage` in your example to `Importo medio mensile` as you mentioned in your comments;
  • My code above uses transform, which is base R and in general works fine, except when the names being used are "not normal R names", in which case it tends to add .s to the name. The use of dplyr::mutate fixes this problem. You're already using right_join, so I think I'm not adding any dependency.

Another way to look at this: the data = ~ mutate(...) is changing the data internally-only, so that the original data is untouched. One could easily do something like this for the same effect.

right_join(prov2022, database, by = "COD_PROV") %>% 
  mutate(SOMETHING = ifelse(`Importo medio mensile` > 300 & `Importo medio mensile` <= 800, `Importo medio mensile`, `Importo medio mensile`[NA])) %>%
  ggplot(aes(fill = SOMETHING))  
  geom_sf()  
  theme_void()  
  theme(legend.title=element_blank()) 
  scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black'))  
  geom_blank(aes(fill = `Importo medio mensile`))

noting that we needed to redefine fill= in the blank geom so that the correct range of values would be processed by ggplot.

  • Related