I have a dataset that looks at college enrollment. I'm trying to find the proportion of students enrolled in biology per institute. I find the enrollment(EFTOTLT) for each school first using:
#find sum of students by school
total_enrollment <- school_data_unit_cip %>%
group_by(UNITID) %>%
summarise(Freq = sum(EFTOTLT))
This yields a tibble that's 2,207 x 2, then I find the enrollment for Biology for each school using:
#find total biology enrollment by school
total_biol_enrollment <- school_data_unit_cip %>%
group_by(UNITID) %>%
filter(CIPCODE == "26") %>%
summarise(Freq = sum(EFTOTLT))
Then I realize this yields a tibble that's 1,560 x 2. So there are obviously schools that don't offer biology or don't have biology students.
Is there a way to deselect schools from the first tibble that don't have the CIPCODE 26? Or I guess is there a way to remove schools from the first list that don't exist in the second list?
CodePudding user response:
Without sample data it's a guess, but ... assuming that each school may have more than one CIPCODE
, and you want only schools that contain at least CIPCODE == "26"
, then perhaps
school_data_unit_cip %>%
filter(! "26" %in% CIPCODE)
CodePudding user response:
updated after the remarks in the other answer.
i think you can filter them out if you group first, but don't no for sure without the data:
total_biol_enrollment <- school_data_unit_cip %>%
group_by(UNITID) %>%
filter(!any(CIPCODE== "26"))