I am new to R and there may be a simple solution to this but I'm struggling to find one. I wish to subset a data frame to exclude all rows that don't have both values offered in another row. So, let's say this is my data frame: df1
v1 | v2 | v3 |
---|---|---|
A | 1 | x |
A | 2 | y |
A | 3 | x |
B | 4 | x |
C | 5 | y |
C | 6 | y |
D | 7 | y |
D | 8 | x |
I wish to eliminate any rows that do NOT have both an x and y value (v3) for a corresponding letter (v1) while also keeping all other columns intact (v2) so my final result would be:
v1 | v2 | v3 |
---|---|---|
A | 1 | x |
A | 2 | y |
A | 3 | x |
D | 7 | y |
D | 8 | x |
Only values A and D would be retained because they have both a corresponding x and a corresponding y value. B and C would be eliminated since they only have either x OR y but not both.
I've tried using group_by and filter. The result comes out as an empty data frame:
library(dplyr)
df2 <- df1 %>%
group_by(v1) %>%
filter(all(c('x', 'y') %in% v3))
as well as:
library(dplyr)
df2 <- df1 %>%
group_by(v1) %>%
filter(any(v3 == "x"),
any(v3 == "y"))
CodePudding user response:
df1 %>%
group_by(v1) %>%
filter(all(unique(df1$v3) %in% v3))
# A tibble: 5 x 3
# Groups: v1 [2]
v1 v2 v3
<chr> <int> <chr>
1 A 1 x
2 A 2 y
3 A 3 x
4 D 7 y
5 D 8 x
CodePudding user response:
Try this aggregate
solution
df1[df1$v1 %in% names( which( table(
aggregate( . ~ v3 v1, df1, c )[,"v1"] ) > 1 )),]
v1 v2 v3
1 A 1 x
2 A 2 y
3 A 3 x
7 D 7 y
8 D 8 x
Data
df1 <- structure(list(v1 = c("A", "A", "A", "B", "C", "C", "D", "D"),
v2 = 1:8, v3 = c("x", "y", "x", "x", "y", "y", "y", "x")), class = "data.frame", row.names = c(NA,
-8L))