Home > Blockchain >  Identify first group of two consecutive numbers in data frame by group
Identify first group of two consecutive numbers in data frame by group

Time:06-05

I have a data frame (testdf_long) with 1632 obs. of 3 variables (year, model, and value). Here are the first 34 rows of the data frame:

year      model                            value
<chr>     <chr>                            <dbl>
1 2015-2020 RGI60.01.01854_ACCESS.CM2_ssp126     0
2 2020-2025 RGI60.01.01854_ACCESS.CM2_ssp126     0
3 2025-2030 RGI60.01.01854_ACCESS.CM2_ssp126     0
4 2030-2035 RGI60.01.01854_ACCESS.CM2_ssp126     0
5 2035-2040 RGI60.01.01854_ACCESS.CM2_ssp126     1
6 2040-2045 RGI60.01.01854_ACCESS.CM2_ssp126     1
7 2045-2050 RGI60.01.01854_ACCESS.CM2_ssp126     1
8 2050-2055 RGI60.01.01854_ACCESS.CM2_ssp126     1
9 2055-2060 RGI60.01.01854_ACCESS.CM2_ssp126     0
10 2060-2065 RGI60.01.01854_ACCESS.CM2_ssp126    1
11 2065-2070 RGI60.01.01854_ACCESS.CM2_ssp126    1
12 2070-2075 RGI60.01.01854_ACCESS.CM2_ssp126    1
13 2075-2080 RGI60.01.01854_ACCESS.CM2_ssp126    1
14 2080-2085 RGI60.01.01854_ACCESS.CM2_ssp126    1
15 2085-2090 RGI60.01.01854_ACCESS.CM2_ssp126    1
16 2090-2095 RGI60.01.01854_ACCESS.CM2_ssp126    1
17 2095-2100 RGI60.01.01854_ACCESS.CM2_ssp126    1
18 2015-2020 RGI60.01.01854_ACCESS.ESM1.5_ssp126 0
19 2020-2025 RGI60.01.01854_ACCESS.ESM1.5_ssp126 0
20 2025-2030 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
21 2030-2035 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
22 2035-2040 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
23 2040-2045 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
24 2045-2050 RGI60.01.01854_ACCESS.ESM1.5_ssp126 0
25 2050-2055 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
26 2055-2060 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
27 2060-2065 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
28 2065-2070 RGI60.01.01854_ACCESS.ESM1.5_ssp126 0
29 2070-2075 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
30 2075-2080 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
31 2080-2085 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
32 2085-2090 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
33 2090-2095 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1
34 2095-2100 RGI60.01.01854_ACCESS.ESM1.5_ssp126 1

The value column is binary where 1 indicates that a critical temperature value was exceeded, and 0 indicates the temperature is below that critical value.

I would like to identify the first time there are at least two consecutive 5-year bins where the critical temperature is exceeded (the value == 1 in at least two consecutive rows) for each model. There are 96 models in total, and I am showing the first 2 models here.

My desired output would be something like:

2035-2040 RGI60.01.01854_ACCESS.CM2_ssp126
2025-2030 RGI60.01.01854_ACCESS.ESM1.5_ssp126 

Looking for data.table (or dplyr) solutions.

CodePudding user response:

If your data set name is df then ,

newdf <- df %>% group_by(model) %>% mutate(cumval = cumsum(value))

newdf[which(newdf$cumval == 2)-1 ,][1:2]

# A tibble: 2 × 2
# Groups:   model [2]
  year      model                              
  <chr>     <chr>                               
1 2035-2040 RGI60.01.01854_ACCESS.CM2_ssp126        
2 2025-2030 RGI60.01.01854_ACCESS.ESM1.5_ssp126     

CodePudding user response:

One approach is to simply filter where the value is 1 as well as the lead (following row) value is also 1. Then, if grouped by model, you can slice(1) to get the first row/occurrence for a given model.

library(tidyverse)

testdf_long %>%
  group_by(model) %>%
  filter(value == 1 & lead(value) == 1) %>%
  slice(1)

Output

  year      model                               value
  <chr>     <chr>                               <int>
1 2035-2040 RGI60.01.01854_ACCESS.CM2_ssp126        1
2 2025-2030 RGI60.01.01854_ACCESS.ESM1.5_ssp126     1
  • Related