I am working on a uni project with EU-SILC data. I want to create a new variable where all households are assigned to their corresponding housing cost group to create a stacked density plot with the income distribution in relation to housing cost.
I encountered two problems:
- I cannot create the variable hcost_group because my housing cost variable, which is the basis for assigning the households to the groups has 47 NAs (out of nearly 70.000 observations). I tried many different things to remove the NAs when creating the new variable but I keep getting an error message.
- As I don't want to generally remove the households for which I dont have housing cost the hcost_group variable will be shorter than my income variable - how can I just for the plot exclude the income of the households for which I don't have a housing cost?
Thanks a lot in advance!
Here is my code (inkl error messages) for creating the variable and the plot:
data <- data %>% filter(!is.na(hcost)) %>% group_by(country) %>%
mutate(hcost_group = quantcut(hcost, q=c(0.1, 0.2, 0.3, 0.4)))
Error: Problem with `mutate()` column `hcost_group`.
i `hcost_group = quantcut(hcost, q = c(0.1, 0.2, 0.3, 0.4))`.
x missing value where TRUE/FALSE needed
i The error occurred in group 6: country = "UK".
Run `rlang::last_error()` to see where the error occurred.
>
> ggplot(data=data, aes(x=decile, group=hcost_group, fill=hcost_group))
geom_density(adjust=1.5, position="fill")
facet_wrap(~country)
xlab("Einkommensdezil")
ylab("Anteil der Gruppen nach Wohnkostenbelastung")
scale_fill_discrete(name = "Wohnkostenbelastung (Anteil der Wohnkosten am EK)",
labels =
c("0-10%", "10-20%","20-30%",
"30-40%", "40-100%"))
Error in FUN(X[[i]], ...) : object 'hcost_group' not found
I alsoa tried "na.rm = TRUE", "na.omit()" and also "complete.cases".
CodePudding user response:
In the first problem, I believe that the issue is not NA... (you can't say without seeing the base), it seems that your quantcut function is missing the correct q parameter. Q waits an integer...
In the second problem make a data frame with the filtered data.
it would also not be possible to make your mutate before the group_by
CodePudding user response:
Does this have something to do with the random
before the mutate()
call?
data <- data %>%
drop_na(hcost) %>%
group_by(country) %>%
mutate(
hcost_group = quantcut(hcost, q = c(.1, .2, .3, .4))
)
I would also ensure that hcost
is stored as a numeric vector.