Home > Blockchain >  label latest sequence of ascending numbers
label latest sequence of ascending numbers

Time:10-18

I have a scenario where I want to label the most recent seq of numbers. The only thing I know about the seq is that it is ascending. When it stops doing that, I'd like a label. But I only need a label for the most recent sequence of ascending numbers. So here is an example of the data I have:

library(tibble)

n <- 5

x <- tibble(
  date = seq.Date(as.Date("2022-10-05"), Sys.Date(), by = "days"),
  split_seq = c(seq(1, 10, length.out = n), seq(1, 10, length.out = length(date) - n))
)

x
#> # A tibble: 13 × 2
#>    date       split_seq
#>    <date>         <dbl>
#>  1 2022-10-05      1   
#>  2 2022-10-06      3.25
#>  3 2022-10-07      5.5 
#>  4 2022-10-08      7.75
#>  5 2022-10-09     10   
#>  6 2022-10-10      1   
#>  7 2022-10-11      2.29
#>  8 2022-10-12      3.57
#>  9 2022-10-13      4.86
#> 10 2022-10-14      6.14
#> 11 2022-10-15      7.43
#> 12 2022-10-16      8.71
#> 13 2022-10-17     10

This is an example of what I'd like with the label_col added in:

#> # A tibble: 13 × 3
#>    date       split_seq label_col         
#>    <date>         <dbl> <chr>             
#>  1 2022-10-05      1    not last seq label
#>  2 2022-10-06      3.25 not last seq label
#>  3 2022-10-07      5.5  not last seq label
#>  4 2022-10-08      7.75 not last seq label
#>  5 2022-10-09     10    not last seq label
#>  6 2022-10-10      1    last seq label    
#>  7 2022-10-11      2.29 last seq label    
#>  8 2022-10-12      3.57 last seq label    
#>  9 2022-10-13      4.86 last seq label    
#> 10 2022-10-14      6.14 last seq label    
#> 11 2022-10-15      7.43 last seq label    
#> 12 2022-10-16      8.71 last seq label    
#> 13 2022-10-17     10    last seq label

A dplyr specific answer would be nice though not required.

CodePudding user response:

Something like this?

library(dplyr, warn.conflicts = FALSE)

x |> 
  mutate(
    is_increase = lag(split_seq) < split_seq,
    is_first_seq = !as.logical(cumsum(!coalesce(is_increase, TRUE))),
    label_col = if_else(
      is_first_seq,
      "not last seq label",
      "last seq label"
    )
  ) |> 
  select(-c(is_increase, is_first_seq))
#> # A tibble: 13 × 3
#>    date       split_seq label_col         
#>    <date>         <dbl> <chr>             
#>  1 2022-10-05      1    not last seq label
#>  2 2022-10-06      3.25 not last seq label
#>  3 2022-10-07      5.5  not last seq label
#>  4 2022-10-08      7.75 not last seq label
#>  5 2022-10-09     10    not last seq label
#>  6 2022-10-10      1    last seq label    
#>  7 2022-10-11      2.29 last seq label    
#>  8 2022-10-12      3.57 last seq label    
#>  9 2022-10-13      4.86 last seq label    
#> 10 2022-10-14      6.14 last seq label    
#> 11 2022-10-15      7.43 last seq label    
#> 12 2022-10-16      8.71 last seq label    
#> 13 2022-10-17     10    last seq label

Created on 2022-10-17 with reprex v2.0.2

Note that it might make for clearer code if you abstract the above logic into a function. E.g. something like this:

index_sequences <- function(x) {
  cumsum(coalesce(lag(x) > x, 1)) 
}

index_sequences(c(1:4, 3:5, 1:3))
#>  [1] 1 1 1 1 2 2 2 3 3 3

Created on 2022-10-17 with reprex v2.0.2

CodePudding user response:

Using x<lag(x) to check if the value reset. Then rev() to match() the list in reverse to find the last value reset. With the location (from the end) of the last reset, subtracted from the length of the vector to find the index. Finally, ifelse() on the index to assign the strings:

Base:

x$c<-ifelse(1:length(x$split_seq) > length(x$split_seq)-match(T, rev(x$split_seq<lag(x$split_seq))),
            "last seq label",
            "not last seq label")

Dplyr:

x<-x %>% 
  mutate(d = if_else(row_number() > (length(split_seq)-match(T, rev(split_seq<lag(split_seq)))),
                    "last seq label",
                    "not last seq label"))

CodePudding user response:

My approach is similar, but has less intermediate variables:

library(dplyr)

x %>% 
  mutate(col_label = cumsum(!split_seq > lag(split_seq, default = first(split_seq))) %>% 
           { ifelse(. == max(.),
                  "last seq label",
                  "not last seq label") }
         )

#> # A tibble: 13 × 3
#>    date       split_seq col_label         
#>    <date>         <dbl> <chr>             
#>  1 2022-10-05      1    not last seq label
#>  2 2022-10-06      3.25 not last seq label
#>  3 2022-10-07      5.5  not last seq label
#>  4 2022-10-08      7.75 not last seq label
#>  5 2022-10-09     10    not last seq label
#>  6 2022-10-10      1    last seq label    
#>  7 2022-10-11      2.29 last seq label    
#>  8 2022-10-12      3.57 last seq label    
#>  9 2022-10-13      4.86 last seq label    
#> 10 2022-10-14      6.14 last seq label    
#> 11 2022-10-15      7.43 last seq label    
#> 12 2022-10-16      8.71 last seq label    
#> 13 2022-10-17     10    last seq label

Created on 2022-10-17 with reprex v2.0.2

  • Related