Hy guys, I'm working with ggplot2 and creating a geographic representation of my country. This is the dataset and the script I'm using ( prov2022 is the shapefile for the map)
#database
COD_REG COD_PROV Wage
1 91 530
1 92 520
1 93 410
2 97 300
2 98 205
2 99 501
13 102 700
13 103 800
13 159 900
18 162 740
18 123 590
18 119 420
19 162 340
19 123 290
19 119 120
#script
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `Wage`))
geom_sf(data = ~ subset(., `Wage` > 300 & `Wage` <= 800))
theme_void()
theme(legend.title=element_blank())
scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black'))
geom_blank()
It works fine, but I'm insterested in visualizing also the shapes of the areas that I've exclude with the command subset. My purpose was to fill with the color gradient only the regions with Wage
> 300 & Wage
<= 800, but setting geom_sf(data = ~ subset(., Wage
> 300 & Wage
<= 800)) I have removed completely the ones that do not satisfy this condition from my map. Actually, I need to have them in the output but whitout being filled (just their shapes).
How do I solve?
UPDATE ABOUT SCRIPT This is what I'm using after @r2evans' suggestion
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `Importo medio mensile`))
geom_sf(data = ~ transform(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 1500 & `Importo medio mensile` <= 1700, `Importo medio mensile`[NA], `Importo medio mensile`)))
theme_void()
theme(legend.title=element_blank())
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), na.value = "#00000000")
geom_blank()
but the answer is
Error in FUN(X[[i]], ...) : object 'Importo medio mensile' not found
UPDATE PART 2
If I want to fill using another variable Salario reale, but I want to maintaining the selection of the areas done with the values of the previous variable Importo medio mensile, what should I do?
Substituting only the fill variable doesn't work
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `Salario Reale`))
geom_sf(data = ~ dplyr::mutate(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 1500 & `Importo medio mensile` <= 1700, `Importo medio mensile`, `Importo medio mensile`[NA])))
theme_void()
theme(legend.title=element_blank())
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), na.value = "#00000000")
geom_blank()
it colors all the regions of my country as if the subset that I want to maintain (the one with the variable Importo medio mensile) weren't there. How can I solve?
UPDATE 3 The solution proposed by r2evans works!!
CodePudding user response:
Instead of filtering out the data, just replace (optionally inline) the not-to-be-colored values with NA
.
(The use of val[NA]
is to make sure we have the one specific class of NA
, as there are at least 6 different types of NA
.)
Granted, gray may not be what you want, so you can fix that with na.value=
(its default is na.value="grey50"
).
ggplot(usa, aes(fill = val))
geom_sf(data = ~ transform(., val = ifelse(val < 0.5, val[NA], val)))
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'),
na.value = "#00000000")
geom_blank()
where #00000000
is a fully-transparent color. The first six 0
s don't matter, the trailing two 00
indicates an alpha of 0 (transparent).
I think this means you want something like this:
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `Importo medio mensile`))
geom_sf(data = ~ mutate(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 300 & `Importo medio mensile` <= 800, `Importo medio mensile`, `Importo medio mensile`[NA])))
theme_void()
theme(legend.title=element_blank())
scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black'))
geom_blank()
Notes:
- I updated from
`Wage`
in your example to`Importo medio mensile`
as you mentioned in your comments; - My code above uses
transform
, which is base R and in general works fine, except when the names being used are "not normal R names", in which case it tends to add.
s to the name. The use ofdplyr::mutate
fixes this problem. You're already usingright_join
, so I think I'm not adding any dependency.
Another way to look at this: the data = ~ mutate(...)
is changing the data internally-only, so that the original data is untouched. One could easily do something like this for the same effect.
right_join(prov2022, database, by = "COD_PROV") %>%
mutate(SOMETHING = ifelse(`Importo medio mensile` > 300 & `Importo medio mensile` <= 800, `Importo medio mensile`, `Importo medio mensile`[NA])) %>%
ggplot(aes(fill = SOMETHING))
geom_sf()
theme_void()
theme(legend.title=element_blank())
scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black'))
geom_blank(aes(fill = `Importo medio mensile`))
noting that we needed to redefine fill=
in the blank geom so that the correct range of values would be processed by ggplot.