I have a variable with 3 values: Male, Female, Unknown. For many parts of the analysis, I need to keep the unknown but I want to do a density/histogram comparing some scores WITHOUT the unknown. What else do I need to add to take out one of the values?
My data looks like this:
GenderDescription | SATCompositeSuper |
---|---|
Female | 730 |
Female | 780 |
Male | 800 |
Female | 1000 |
Female | 1110 |
Female | NA |
Male | 1050 |
Male | 950 |
Unknown | 900 |
Male | 780 |
Syntax:
# Color by groups- gender
master_df %>%
drop_na() %>%
library(ggplot2)
ggplot(master_df, aes(x=SATCompositeSuper, na.rm=TRUE, color=GenderDescription,
fill=GenderDescription))
geom_histogram(aes(y=..density..), alpha=0.5, position="identity")
geom_density(alpha=.2)
Current Output (because I wasn't thinking about the Unknown) is this:
CodePudding user response:
Your example doesn't produce the plot that you showed in your post however, there are two ways I can think of filtering out the Unknown
First, you can filter out the data before you plot the data
library(dplyr)
library(tidyverse)
master_df <- master_df %>%
drop_na() %>%
filter(GenderDescription != "Unknown")
ggplot(master_df, aes(x=SATCompositeSuper, na.rm=TRUE, color=GenderDescription, fill=GenderDescription))
geom_histogram(aes(y=..density..), alpha=0.5, position="identity")
geom_density(alpha=.2)
And the second is to filter the data as you're plotting
ggplot(data=master_df[!master_df$GenderDescription %in% c("Unknown"),], aes(x=SATCompositeSuper, na.rm=TRUE, color=GenderDescription, fill=GenderDescription))
geom_histogram(aes(y=..density..), alpha=0.5, position="identity")
geom_density(alpha=.2)