Is it possible to look at variables in a data frame and delete some rows based off of certain conditions? If I have the table:
Number | Value |
---|---|
1 | TRUE |
1 | FALSE |
2 | FALSE |
2 | FALSE |
3 | FALSE |
3 | TRUE |
4 | FALSE |
4 | FALSE |
5 | TRUE |
5 | FALSE |
I want to have exactly one row of each number, and I will delete whichever row is false, and if both values in the number are false, then I will just delete one of the rows. This should leave me with the table like
Number | Value |
---|---|
1 | TRUE |
2 | FALSE |
3 | TRUE |
4 | FALSE |
5 | TRUE |
Is it possible to filter by number then delete the first false value? Or anything similar to that?
CodePudding user response:
You can arrange
and then use distinct
-
library(dplyr)
df %>%
arrange(Number, !Value) %>%
distinct(Number, .keep_all = TRUE)
# Number Value
#1 1 TRUE
#2 2 FALSE
#3 3 TRUE
#4 4 FALSE
#5 5 TRUE
arrange
would keep the TRUE
values ahead of FALSE
ones and then we select the 1st row for each Number
.
Another option would be to check for condition in each group.
df %>%
group_by(Number) %>%
filter(if(any(Value)) Value else row_number() == 1) %>%
ungroup
CodePudding user response:
Another approach:
library(dplyr)
df %>% group_by(Number) %>% filter(if(sum(Value == FALSE) == 2) row_number() == 1 else Value == TRUE)
# A tibble: 5 x 2
# Groups: Number [5]
Number Value
<int> <lgl>
1 1 TRUE
2 2 FALSE
3 3 TRUE
4 4 FALSE
5 5 TRUE