I have a data frame constructed similarly to this:
df <-
read.table(textConnection("ID Column1 Column2
A 1 0
A 0 1
B 1 0
B 0 0
C 1 0
D 1 0
D 0 1"), header=TRUE)
ID | Column1 | Column2 |
---|---|---|
A | 1 | 0 |
A | 0 | 1 |
B | 1 | 0 |
B | 0 | 0 |
C | 1 | 0 |
C | 0 | 0 |
D | 1 | 0 |
D | 0 | 1 |
I am looking to filter the data frame so that rows are kept only when there is a value of '1' in Column1 and also a value of '1' in Column2.
So output in this case would be
ID | Column1 | Column2 |
---|---|---|
A | 1 | 0 |
A | 0 | 1 |
D | 1 | 0 |
D | 0 | 1 |
Any help will be very much appreciated, thank you!
CodePudding user response:
Using dplyr
df %>%
group_by(ID) %>%
filter(sum(Column1)>=1 & sum(Column2)>=1)
gives out
# A tibble: 4 × 3
# Groups: ID [2]
ID Column1 Column2
<chr> <int> <int>
1 A 1 0
2 A 0 1
3 D 1 0
4 D 0 1
CodePudding user response:
We could also use any
:
df |>
group_by(ID) |>
filter(any(Column1 == 1) & any(Column2 == 1)) |>
ungroup()
Output:
# A tibble: 4 × 3
ID Column1 Column2
<chr> <int> <int>
1 A 1 0
2 A 0 1
3 D 1 0
4 D 0 1
CodePudding user response:
Same principle but using data.table
like this:
df <-
read.table(textConnection("ID Column1 Column2
A 1 0
A 0 1
B 1 0
B 0 0
C 1 0
D 1 0
D 0 1"), header=TRUE)
library(data.table)
setDT(df)[,.SD[any(Column1 == 1) & any(Column2 == 1)], by=ID]
#> ID Column1 Column2
#> 1: A 1 0
#> 2: A 0 1
#> 3: D 1 0
#> 4: D 0 1
Created on 2022-08-10 by the reprex package (v2.0.1)