I am trying to remove rows in my dataframe that meet 2 conditions simultaneously. For example, based on the dataframe created below, I want to remove rows that are both green AND group A. However, based on the code I am using, rows are removed when they are green or group A.
data = data.frame(Group = c(rep('A',9), rep('B',9)),
Color= c(rep('Red',3), rep('Green',3), rep('Yellow', 3), rep('Red',4), rep('Green',5)))
summary(data)
names <- c(1:2)
data[,names] <- lapply(data[,names], factor)
summary(data)
newdata <- subset(data, Group != "A" & Color != "Green")
summary(newdata)
How can I get the result I am aiming for?
CodePudding user response:
It sounds like you want this:
Group A Not Group A
Green EXCLUDE include
Not Green include include
Your line subset(data, Group != "A" & Color != "Green")
means you are only keeping rows that are BOTH Not Group A and Not Green, which is just the bottom right category. You want things that are EITHER Not Group A or Not Green, which could be done with |
= OR where you have &
= AND.
Or, as ~Darren-tsai noted, you could look for rows that are not BOTH A and Green, ie !(Group == "A" & Color == "Green)
.
CodePudding user response:
As already stated in the comments, here is a base R solution.
data[which(!(data$Group == "A" & data$Color == "Green")), ]
data
Group Color
1 A Red
2 A Red
3 A Red
7 A Yellow
8 A Yellow
9 A Yellow
10 B Red
11 B Red
12 B Red
13 B Red
14 B Green
15 B Green
16 B Green
17 B Green
18 B Green