Home > Net >  R subset rows of same value in one column dependent on multiple values in another column
R subset rows of same value in one column dependent on multiple values in another column

Time:12-04

I am new to R and there may be a simple solution to this but I'm struggling to find one. I wish to subset a data frame to exclude all rows that don't have both values offered in another row. So, let's say this is my data frame: df1

v1 v2 v3
A 1 x
A 2 y
A 3 x
B 4 x
C 5 y
C 6 y
D 7 y
D 8 x

I wish to eliminate any rows that do NOT have both an x and y value (v3) for a corresponding letter (v1) while also keeping all other columns intact (v2) so my final result would be:

v1 v2 v3
A 1 x
A 2 y
A 3 x
D 7 y
D 8 x

Only values A and D would be retained because they have both a corresponding x and a corresponding y value. B and C would be eliminated since they only have either x OR y but not both.

I've tried using group_by and filter. The result comes out as an empty data frame:

library(dplyr)
df2 <- df1 %>% 
     group_by(v1) %>%
     filter(all(c('x', 'y') %in% v3))

as well as:

library(dplyr)
df2 <- df1 %>% 
   group_by(v1) %>% 
   filter(any(v3 == "x"),
          any(v3 == "y"))

CodePudding user response:

df1 %>%
   group_by(v1) %>%
   filter(all(unique(df1$v3) %in% v3))

# A tibble: 5 x 3
# Groups:   v1 [2]
  v1       v2 v3   
  <chr> <int> <chr>
1 A         1 x    
2 A         2 y    
3 A         3 x    
4 D         7 y    
5 D         8 x  

CodePudding user response:

Try this aggregate solution

df1[df1$v1 %in% names( which( table( 
  aggregate( . ~ v3   v1, df1, c )[,"v1"] ) > 1 )),]

  v1 v2 v3
1  A  1  x
2  A  2  y
3  A  3  x
7  D  7  y
8  D  8  x

Data

df1 <- structure(list(v1 = c("A", "A", "A", "B", "C", "C", "D", "D"),
    v2 = 1:8, v3 = c("x", "y", "x", "x", "y", "y", "y", "x")), class = "data.frame", row.names = c(NA,
-8L))
  • Related