Checking if a statement is true for three consecutive years in R-CodePudding

I am trying to create some code to determine if a certain statement is true for three consecutive years. This statement is, for example, that the amount needs to be negative for three consecutive years. Only if that is true, the third row should be marked in a new column with a 1, otherwise, the rows should be marked as a 0.

To further elaborate on what I try to do, I've created the following dataset:

ID	Year	Amount
001	2022	2
001	2021	-9
001	2020	-10
001	2019	-16
001	2018	-20
002	2022	400
002	2021	300
002	2020	-200
002	2019	-600
002	2018	-500

And the outcome should look somewhat like this:

ID	Year	Amount	Y/N
001	2022	2	0
001	2021	-9	1
001	2020	-10	1
001	2019	-16	0
001	2018	-20	0
002	2022	400	0
002	2021	300	0
002	2020	-200	1
002	2019	-600	0
002	2018	-500	0

Does anybody know how I could code this?

CodePudding user response：

This should do it :

df <- read.table(h=T, text="ID  Year    Amount
001 2022    2
001 2021    -9
001 2020    -10
001 2019    -16
001 2018    -20
002 2022    400
002 2021    300
002 2020    -200
002 2019    -600
002 2018    -500")

library(tidyverse)
df %>% 
  group_by(ID) %>% 
  mutate(YN = rev(diff(c(0,0,0, cumsum(rev(Amount)<0)), 3)  == 3)) %>%
  ungroup()
#> # A tibble: 10 × 4
#>       ID  Year Amount YN   
#>    <int> <int>  <int> <lgl>
#>  1     1  2022      2 FALSE
#>  2     1  2021     -9 TRUE 
#>  3     1  2020    -10 TRUE 
#>  4     1  2019    -16 FALSE
#>  5     1  2018    -20 FALSE
#>  6     2  2022    400 FALSE
#>  7     2  2021    300 FALSE
#>  8     2  2020   -200 TRUE 
#>  9     2  2019   -600 FALSE
#> 10     2  2018   -500 FALSE

^{Created on 2022-04-21 by the reprex package (v2.0.1)}

In base R:

df$YN <- ave(df$Amount, df$ID, FUN = function(x) rev(diff(c(0,0,0, cumsum(rev(x)<0)), 3)  == 3)) == 1

Here's a walkthrough :

df %>% 
  arrange(ID, Year) %>% 
  group_by(ID) %>% 
  mutate(
    cond = Amount < 0,
    cumsum = cumsum(cond),
    diff = diff(c(0,0,0, cumsum), 3),
    YN = diff == 3) # %>% arrange(ID, desc(Year)) # to put back in original shape
#> # A tibble: 10 × 7
#> # Groups:   ID [2]
#>       ID  Year Amount cond  cumsum  diff YN   
#>    <int> <int>  <int> <lgl>  <int> <dbl> <lgl>
#>  1     1  2018    -20 TRUE       1     1 FALSE
#>  2     1  2019    -16 TRUE       2     2 FALSE
#>  3     1  2020    -10 TRUE       3     3 TRUE 
#>  4     1  2021     -9 TRUE       4     3 TRUE 
#>  5     1  2022      2 FALSE      4     2 FALSE
#>  6     2  2018   -500 TRUE       1     1 FALSE
#>  7     2  2019   -600 TRUE       2     2 FALSE
#>  8     2  2020   -200 TRUE       3     3 TRUE 
#>  9     2  2021    300 FALSE      3     2 FALSE
#> 10     2  2022    400 FALSE      3     1 FALSE

And an easier solution to read, not generalisable to big spans:

library(tidyverse)
df %>% 
  group_by(ID) %>% 
  mutate(YN = (Amount < 0)   lead(Amount < 0, default = 0)   lead(Amount < 0, 2, default = 0) == 3) %>% 
  ungroup()

CodePudding user response：

Using the dataset

df <- data.frame(id = c(rep(1,5,),rep(2,5)), year = rep(seq(2022,2018),2), amount = c(2,-9,-10,-16,-20,400,300,-200,-600,-500))

A sapply function can be used:

df$test <- ifelse(sapply(seq(1,nrow(df)), function(x) all(subset(df, df$id == df$id[x] & df$year %in% seq(df$year[x]-2,df$year[x]))$amount < 0) & nrow(subset(df, df$id == df$id[x] & df$year %in% seq(df$year[x]-2,df$year[x]))) > 2),1,0)

That gives:

df
       id year amount test
    1   1 2022      2    0
    2   1 2021     -9    1
    3   1 2020    -10    1
    4   1 2019    -16    0
    5   1 2018    -20    0
    6   2 2022    400    0
    7   2 2021    300    0
    8   2 2020   -200    1
    9   2 2019   -600    0
    10  2 2018   -500    0

Breakdown:

ifelse to change the TRUE/FALSE from the SAPPLY to 1/0

sapply to iterate per row

all to test whether the condition holds for all values inside the subset

subset to select the last three years and the correct id

$amount < 0 to test the condition

CodePudding user response：

You could use data.table package.

library(data.table)

setDT(df)

df[order(Year), YN := match(Reduce(` `, shift(Amount<0, 0:2)), 3, 0), ID]

#        ID  Year Amount    YN
#  1:     1  2022      2     0
#  2:     1  2021     -9     1
#  3:     1  2020    -10     1
#  4:     1  2019    -16     0
#  5:     1  2018    -20     0
#  6:     2  2022    400     0
#  7:     2  2021    300     0
#  8:     2  2020   -200     1
#  9:     2  2019   -600     0
# 10:     2  2018   -500     0

data

df = structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), 
                    Year = c(2022, 2021, 2020, 2019, 2018, 2022, 2021, 2020, 2019, 2018), 
                    Amount = c(2, -9, -10, -16, -20, 400, 300, -200, -600, -500)), 
               row.names = c(NA, -10L), 
               class = "data.frame")