I am working with the R programming language.
I have the following dataset:
id = c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3)
col1 = c(0,0,1,1,0,0,1,0,0,1,1,0,1,0,1,0)
col2 = c("A", "B", "A","A", "B", "A","A", "B", "A","A", "B", "A","A", "B", "A", "B")
my_data = data.frame(id, col1, col2)
my_data$row_num = 1:nrow(my_data)
For each unique ID - whenever col1 = 1 OR col2 = A, I want to delete all remaining rows that occur AFTER this condition (i.e. keep the first occurrence).
I found this question over here (How to Filter out Rows per Group after Condition Occurrs) in which an answer to a similar a question is provided. I tried to adapt this answer for my problem:
library(dplyr)
my_data %>%
group_by(id) %>%
slice(seq_len(which((col1 == 1) | (col2 == "A"))[1]))
Can someone please confirm if I have done this correctly? I am not sure if I have correctly inserted the OR statement within the "slice" function.
Thanks!
CodePudding user response:
Not sure if there's some tidy magic that can go the job but here's the "dumb" approach by partitioning the dataset by ID and looping through each of the parts:
filtered_data <- data.frame(matrix(NA, nrow=0, ncol=4))
colnames(filtered_data) <- colnames(my_data)
rows_added <- 0
for(id in 1:3) {
relevant_data <- my_data[my_data$id == id,]
for(row in 1:nrow(my_data)) {
rows_added <- rows_added 1
filtered_data[rows_added,] <- relevant_data[row,]
jump_condition <- relevant_data[row, "col1"] == 1 | relevant_data[row, "col2"] == "A"
if(jump_condition) {
break
}
}
}
CodePudding user response:
You could filter
based on all the row_number
before the conditions happen using which
like this:
id = c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3)
col1 = c(0,0,1,1,0,0,1,0,0,1,1,0,1,0,1,0)
col2 = c("A", "B", "A","A", "B", "A","A", "B", "A","A", "B", "A","A", "B", "A", "B")
my_data = data.frame(id, col1, col2)
my_data$row_num = 1:nrow(my_data)
library(dplyr)
my_data %>%
group_by(id) %>%
filter(row_number() <= min(which((col1 == 1) | (col2 == "A"))))
#> # A tibble: 4 × 4
#> # Groups: id [3]
#> id col1 col2 row_num
#> <dbl> <dbl> <chr> <int>
#> 1 1 0 A 1
#> 2 2 0 B 5
#> 3 2 0 A 6
#> 4 3 1 A 10
Created on 2023-01-20 with reprex v2.0.2