I have a data frame (testdf_long) with 1632 obs. of 3 variables (year, model, and value). Here are the first 34 rows of the data frame:
year model value
<chr> <chr> <dbl>
1 2015-2020 RGI60.01.01854_ACCESS.CM2_ssp126 0
2 2020-2025 RGI60.01.01854_ACCESS.CM2_ssp126 0
3 2025-2030 RGI60.01.01854_ACCESS.CM2_ssp126 0
4 2030-2035 RGI60.01.01854_ACCESS.CM2_ssp126 0
5 2035-2040 RGI60.01.01854_ACCESS.CM2_ssp126 1
6 2040-2045 RGI60.01.01854_ACCESS.CM2_ssp126 1
7 2045-2050 RGI60.01.01854_ACCESS.CM2_ssp126 1
8 2050-2055 RGI60.01.01854_ACCESS.CM2_ssp126 1
9 2055-2060 RGI60.01.01854_ACCESS.CM2_ssp126 0
10 2060-2065 RGI60.01.01854_ACCESS.CM2_ssp126 1
11 2065-2070 RGI60.01.01854_ACCESS.CM2_ssp126 1
12 2070-2075 RGI60.01.01854_ACCESS.CM2_ssp126 1
13 2075-2080 RGI60.01.01854_ACCESS.CM2_ssp126 1
14 2080-2085 RGI60.01.01854_ACCESS.CM2_ssp126 1
15 2085-2090 RGI60.01.01854_ACCESS.CM2_ssp126 1
16 2090-2095 RGI60.01.01854_ACCESS.CM2_ssp126 1
17 2095-2100 RGI60.01.01854_ACCESS.CM2_ssp126 1
18 2015-2020 RGI60.01.01854_ACCESS.ESM1.5_ssp126 0
19 2020-2025 RGI60.01.01854_ACCESS.ESM1.5_ssp126 0
20 2025-2030 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
21 2030-2035 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
22 2035-2040 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
23 2040-2045 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
24 2045-2050 RGI60.01.01854_ACCESS.ESM1.5_ssp126 0
25 2050-2055 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
26 2055-2060 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
27 2060-2065 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
28 2065-2070 RGI60.01.01854_ACCESS.ESM1.5_ssp126 0
29 2070-2075 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
30 2075-2080 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
31 2080-2085 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
32 2085-2090 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
33 2090-2095 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
34 2095-2100 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
The value column is binary where 1 indicates that a critical temperature value was exceeded, and 0 indicates the temperature is below that critical value.
I would like to identify the first time there are at least two consecutive 5-year bins where the critical temperature is exceeded (the value == 1 in at least two consecutive rows) for each model. There are 96 models in total, and I am showing the first 2 models here.
My desired output would be something like:
2035-2040 RGI60.01.01854_ACCESS.CM2_ssp126
2025-2030 RGI60.01.01854_ACCESS.ESM1.5_ssp126
Looking for data.table (or dplyr) solutions.
CodePudding user response:
If your data set name is df
then ,
newdf <- df %>% group_by(model) %>% mutate(cumval = cumsum(value))
newdf[which(newdf$cumval == 2)-1 ,][1:2]
# A tibble: 2 × 2
# Groups: model [2]
year model
<chr> <chr>
1 2035-2040 RGI60.01.01854_ACCESS.CM2_ssp126
2 2025-2030 RGI60.01.01854_ACCESS.ESM1.5_ssp126
CodePudding user response:
One approach is to simply filter
where the value
is 1 as well as the lead
(following row) value
is also 1. Then, if grouped by model
, you can slice(1)
to get the first row/occurrence for a given model
.
library(tidyverse)
testdf_long %>%
group_by(model) %>%
filter(value == 1 & lead(value) == 1) %>%
slice(1)
Output
year model value
<chr> <chr> <int>
1 2035-2040 RGI60.01.01854_ACCESS.CM2_ssp126 1
2 2025-2030 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1