Home > Blockchain >  Retain all groups in a query when two conditions are met across two columns
Retain all groups in a query when two conditions are met across two columns

Time:08-10

I have a data frame constructed similarly to this:

df <- 
  read.table(textConnection("ID   Column1   Column2
A 1 0
A 0 1
B 1 0
B 0 0 
C 1 0
D 1 0 
D 0 1"), header=TRUE)
ID Column1 Column2
A 1 0
A 0 1
B 1 0
B 0 0
C 1 0
C 0 0
D 1 0
D 0 1

I am looking to filter the data frame so that rows are kept only when there is a value of '1' in Column1 and also a value of '1' in Column2.

So output in this case would be

ID Column1 Column2
A 1 0
A 0 1
D 1 0
D 0 1

Any help will be very much appreciated, thank you!

CodePudding user response:

Using dplyr

df %>% 
  group_by(ID) %>% 
  filter(sum(Column1)>=1 & sum(Column2)>=1)

gives out

# A tibble: 4 × 3
# Groups:   ID [2]
  ID    Column1 Column2
  <chr>   <int>   <int>
1 A           1       0
2 A           0       1
3 D           1       0
4 D           0       1

CodePudding user response:

We could also use any:

df |>
  group_by(ID) |> 
  filter(any(Column1 == 1) & any(Column2 == 1)) |>
  ungroup()

Output:

# A tibble: 4 × 3
  ID    Column1 Column2
  <chr>   <int>   <int>
1 A           1       0
2 A           0       1
3 D           1       0
4 D           0       1

CodePudding user response:

Same principle but using data.table like this:

df <- 
  read.table(textConnection("ID   Column1   Column2
A 1 0
A 0 1
B 1 0
B 0 0 
C 1 0
D 1 0 
D 0 1"), header=TRUE)

library(data.table)
setDT(df)[,.SD[any(Column1 == 1) & any(Column2 == 1)], by=ID]
#>    ID Column1 Column2
#> 1:  A       1       0
#> 2:  A       0       1
#> 3:  D       1       0
#> 4:  D       0       1

Created on 2022-08-10 by the reprex package (v2.0.1)

  • Related