I would like to assign a vector in R to write less code and ensure there are no mistakes in the code.
I want to exclude observations "obs 1", "obs 4", "obs 9"
etc. using the subset()
-function. However, I want to create a vector of these observations to use in the subset()
-function instead.
Example - how I would like it to be:
excluded <- column1!= "obs 1" & column1!= "obs 4" & column1!= "obs 9"
dataframe <- subset(dataframe, excluded)
Example - What works and what I want to avoid
excluded <- column1!= "obs 1" & column1!= "obs 4" & column1!= "obs 9"
dataframe <- subset(dataframe, column1!= "obs 1" & column1!= "obs 4" & column1!= "obs 9")
I have tried both c()
, list()
, the combination of them both, and column1 <- "column1"
.
Thank you in advance!
Update with data set example.
set.seed(42)
n <- 12
dataframe <- data.frame(column1=as.character(factor(paste("obs",1:n))),rand=rep(LETTERS[1:2], n/2), x=rnorm(n))
dataframe
#output -first 5 rows:
column1 rand x
1 obs 1 A 1.37096
2 obs 2 B -0.56470
3 obs 3 A 0.36313
4 obs 4 B 0.63286
5 obs 5 A 0.40427
CodePudding user response:
# load package
library(data.table)
# set as datatable
setDT(dataframe)
# put exclusion criteria into vector
y <- c("obs 1", "obs 4", "obs 9")
# subset
dataframe[!column1 %in% y]
CodePudding user response:
You should specify a logical vector and then using it to subset. The %in%
operator avoids repetition.
excluded <- !(dataframe$column1 %in% c("obs 1", "obs 4", "obs 9"))
# [1] FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
subset(dataframe, excluded)
# column1 rand x
# 2 obs 2 B -0.56469817
# 3 obs 3 A 0.36312841
# 5 obs 5 A 0.40426832
# 6 obs 6 B -0.10612452
# 7 obs 7 A 1.51152200
# 8 obs 8 B -0.09465904
# 10 obs 10 B -0.06271410
# 11 obs 11 A 1.30486965
# 12 obs 12 B 2.28664539