I would like to remove all rows before a specific factor (in this case the first Running in Run column) and all rows after the last specific case (in this case also Running in Run Column).
data <- tribble( ~Date, ~Time, ~Run, "17/04/12", "00:10:00", "Stop", "17/04/12", "00:19:00", "Stop", "17/04/12", "00:25:00", "Running", "17/04/12", "00:29:00", "Running", "17/04/12", "00:25:00", "Stop", "17/04/12", "00:29:00", "Running", "17/04/12", "00:30:00", "Running", "18/04/12", "00:10:00", "Stop", "18/04/12", "00:14:00", "Running", "18/04/12", "00:20:00", "Stop" "18/04/12", "00:24:00", "Stop" "18/04/12", "00:26:00", "Stop" )
Date | Time | Run |
---|---|---|
17/04/12 | 00:10:00 | |
17/04/12 | 00:19:00 | |
17/04/12 | 00:25:00 | Running |
17/04/12 | 00:29:00 | Running |
17/04/12 | 00:25:00 | Stop |
17/04/12 | 00:29:00 | Running |
17/04/12 | 00:30:00 | Running |
18/04/12 | 00:10:00 | Stop |
18/04/12 | 00:14:00 | Running |
18/04/12 | 00:20:00 | Stop |
18/04/12 | 00:24:00 | Stop |
18/04/12 | 00:26:00 | Stop |
Result:
Date | Time | Run |
---|---|---|
17/04/12 | 00:25:00 | Running |
17/04/12 | 00:29:00 | Running |
17/04/12 | 00:25:00 | Stop |
17/04/12 | 00:29:00 | Running |
17/04/12 | 00:30:00 | Running |
18/04/12 | 00:10:00 | Stop |
18/04/12 | 00:14:00 | Running |
CodePudding user response:
Here's a solution using dplyr
library(dplyr)
df %>%
filter(Run %in% c("Running", "Stop")) %>%
slice(1:max(which(Run == "Running")))
This gives us:
# A tibble: 7 x 3
Date Time Run
<chr> <chr> <chr>
1 17/04/12 00:25:00 Running
2 17/04/12 00:29:00 Running
3 17/04/12 00:25:00 Stop
4 17/04/12 00:29:00 Running
5 17/04/12 00:30:00 Running
6 18/04/12 00:10:00 Stop
7 18/04/12 00:14:00 Running
CodePudding user response:
Along your data analysis journey, you will face this problem often. If you want to or will have to add more logical checks, you can introduce variables on which you filter. In your case we can simply count the occurrences of Running and Stop. We then use the additional columns to filter your data frame.
library(dplyr)
data %>%
mutate( RunCounts = cumsum(Run == "Running")
, StopCounts = cumsum(Run == "Stop")
) %>%
filter(RunCounts >= 1 & StopCounts < max(StopCounts))
This yields:
# A tibble: 7 x 5
Date Time Run RunCounts StopCounts
<chr> <chr> <chr> <int> <int>
1 17/04/12 00:25:00 Running 1 2
2 17/04/12 00:29:00 Running 2 2
3 17/04/12 00:25:00 Stop 2 3
4 17/04/12 00:29:00 Running 3 3
5 17/04/12 00:30:00 Running 4 3
6 18/04/12 00:10:00 Stop 4 4
7 18/04/12 00:14:00 Running 5 4
You can remove the introduced columns by selecting the ones you are interested in:
data %>% mutate(RunCounts = cumsum(Run == "Running"), StopCounts = cumsum(Run == "Stop")) %>% filter(RunCounts >= 1 & StopCounts < max(StopCounts)) %>% select(Date, Time, Run)
# A tibble: 7 x 3
Date Time Run
<chr> <chr> <chr>
1 17/04/12 00:25:00 Running
2 17/04/12 00:29:00 Running
3 17/04/12 00:25:00 Stop
4 17/04/12 00:29:00 Running
5 17/04/12 00:30:00 Running
6 18/04/12 00:10:00 Stop
7 18/04/12 00:14:00 Running