Home > Enterprise >  How to remove rows in dataframe that meet 2 conditions
How to remove rows in dataframe that meet 2 conditions

Time:08-27

I am trying to remove rows in my dataframe that meet 2 conditions simultaneously. For example, based on the dataframe created below, I want to remove rows that are both green AND group A. However, based on the code I am using, rows are removed when they are green or group A.

data = data.frame(Group = c(rep('A',9), rep('B',9)),
                Color= c(rep('Red',3), rep('Green',3), rep('Yellow', 3), rep('Red',4), rep('Green',5)))

summary(data)
names <- c(1:2)
data[,names] <- lapply(data[,names], factor)
summary(data)

newdata <- subset(data, Group != "A" & Color != "Green")
summary(newdata)

How can I get the result I am aiming for?

CodePudding user response:

It sounds like you want this:

             Group A       Not Group A
Green        EXCLUDE       include
Not Green    include       include

Your line subset(data, Group != "A" & Color != "Green") means you are only keeping rows that are BOTH Not Group A and Not Green, which is just the bottom right category. You want things that are EITHER Not Group A or Not Green, which could be done with | = OR where you have & = AND.

Or, as ~Darren-tsai noted, you could look for rows that are not BOTH A and Green, ie !(Group == "A" & Color == "Green).

CodePudding user response:

As already stated in the comments, here is a base R solution.

data[which(!(data$Group == "A" & data$Color == "Green")), ]
data
   Group  Color
1      A    Red
2      A    Red
3      A    Red
7      A Yellow
8      A Yellow
9      A Yellow
10     B    Red
11     B    Red
12     B    Red
13     B    Red
14     B  Green
15     B  Green
16     B  Green
17     B  Green
18     B  Green
  • Related