Home > Software design >  Remove categorical variable from ggplot density/histogram
Remove categorical variable from ggplot density/histogram

Time:12-04

I have a variable with 3 values: Male, Female, Unknown. For many parts of the analysis, I need to keep the unknown but I want to do a density/histogram comparing some scores WITHOUT the unknown. What else do I need to add to take out one of the values?

My data looks like this:

GenderDescription SATCompositeSuper
Female 730
Female 780
Male 800
Female 1000
Female 1110
Female NA
Male 1050
Male 950
Unknown 900
Male 780

Syntax:

  # Color by groups- gender
  master_df %>%
  drop_na() %>%
  library(ggplot2)
  ggplot(master_df, aes(x=SATCompositeSuper, na.rm=TRUE, color=GenderDescription, 
  fill=GenderDescription))   
   geom_histogram(aes(y=..density..), alpha=0.5, position="identity")
   geom_density(alpha=.2)  

Current Output (because I wasn't thinking about the Unknown) is this: enter image description here

CodePudding user response:

Your example doesn't produce the plot that you showed in your post however, there are two ways I can think of filtering out the Unknown

First, you can filter out the data before you plot the data

 library(dplyr)
 library(tidyverse)
 master_df <- master_df %>%
   drop_na() %>%
   filter(GenderDescription != "Unknown")
   
 ggplot(master_df, aes(x=SATCompositeSuper, na.rm=TRUE, color=GenderDescription, fill=GenderDescription))   
   geom_histogram(aes(y=..density..), alpha=0.5, position="identity")  
   geom_density(alpha=.2)  

And the second is to filter the data as you're plotting

 ggplot(data=master_df[!master_df$GenderDescription %in% c("Unknown"),], aes(x=SATCompositeSuper, na.rm=TRUE, color=GenderDescription, fill=GenderDescription))   
   geom_histogram(aes(y=..density..), alpha=0.5, position="identity")  
   geom_density(alpha=.2)  
  • Related