I want to subset a dataset with several levels of a categorical variable in Rstudio.
With the function "subset" I am able to do it with just one level
new_df<-subset(df, df$cat.var=="level.1")
How do I subset with more than one levels?
CodePudding user response:
You can use %in%
.
This is a membership operator that you can use with a vector of the factor levels of cat.var
which you would like to retain rows for.
new_df <- subset(df, df$cat.var %in% c("level.1", "level.2"))
For example
df <- data.frame(fct = rep(letters[1:3], times = 2), nums = 1:6)
df
# This is our example data.frame
# fct nums
# 1 a 1
# 2 b 2
# 3 c 3
# 4 a 4
# 5 b 5
# 6 c 6
subset(df, df$fct %in% c("a", "b"))
# Subsetting on a factor using %in% returns the following output:
# fct nums
# 1 a 1
# 2 b 2
# 4 a 4
# 5 b 5
Note: Another option is to use the filter
function from dplyr
as follows
library(dplyr)
filter(df, fct %in% c("a", "b"))
This returns the same filtered (subsetted) dataframe.