Home > database >  Checking if a statement is true for three consecutive years in R
Checking if a statement is true for three consecutive years in R

Time:04-22

I am trying to create some code to determine if a certain statement is true for three consecutive years. This statement is, for example, that the amount needs to be negative for three consecutive years. Only if that is true, the third row should be marked in a new column with a 1, otherwise, the rows should be marked as a 0.

To further elaborate on what I try to do, I've created the following dataset:

ID Year Amount
001 2022 2
001 2021 -9
001 2020 -10
001 2019 -16
001 2018 -20
002 2022 400
002 2021 300
002 2020 -200
002 2019 -600
002 2018 -500

And the outcome should look somewhat like this:

ID Year Amount Y/N
001 2022 2 0
001 2021 -9 1
001 2020 -10 1
001 2019 -16 0
001 2018 -20 0
002 2022 400 0
002 2021 300 0
002 2020 -200 1
002 2019 -600 0
002 2018 -500 0

Does anybody know how I could code this?

CodePudding user response:

This should do it :

df <- read.table(h=T, text="ID  Year    Amount
001 2022    2
001 2021    -9
001 2020    -10
001 2019    -16
001 2018    -20
002 2022    400
002 2021    300
002 2020    -200
002 2019    -600
002 2018    -500")

library(tidyverse)
df %>% 
  group_by(ID) %>% 
  mutate(YN = rev(diff(c(0,0,0, cumsum(rev(Amount)<0)), 3)  == 3)) %>%
  ungroup()
#> # A tibble: 10 × 4
#>       ID  Year Amount YN   
#>    <int> <int>  <int> <lgl>
#>  1     1  2022      2 FALSE
#>  2     1  2021     -9 TRUE 
#>  3     1  2020    -10 TRUE 
#>  4     1  2019    -16 FALSE
#>  5     1  2018    -20 FALSE
#>  6     2  2022    400 FALSE
#>  7     2  2021    300 FALSE
#>  8     2  2020   -200 TRUE 
#>  9     2  2019   -600 FALSE
#> 10     2  2018   -500 FALSE

Created on 2022-04-21 by the reprex package (v2.0.1)

In base R:

df$YN <- ave(df$Amount, df$ID, FUN = function(x) rev(diff(c(0,0,0, cumsum(rev(x)<0)), 3)  == 3)) == 1

Here's a walkthrough :

df %>% 
  arrange(ID, Year) %>% 
  group_by(ID) %>% 
  mutate(
    cond = Amount < 0,
    cumsum = cumsum(cond),
    diff = diff(c(0,0,0, cumsum), 3),
    YN = diff == 3) # %>% arrange(ID, desc(Year)) # to put back in original shape
#> # A tibble: 10 × 7
#> # Groups:   ID [2]
#>       ID  Year Amount cond  cumsum  diff YN   
#>    <int> <int>  <int> <lgl>  <int> <dbl> <lgl>
#>  1     1  2018    -20 TRUE       1     1 FALSE
#>  2     1  2019    -16 TRUE       2     2 FALSE
#>  3     1  2020    -10 TRUE       3     3 TRUE 
#>  4     1  2021     -9 TRUE       4     3 TRUE 
#>  5     1  2022      2 FALSE      4     2 FALSE
#>  6     2  2018   -500 TRUE       1     1 FALSE
#>  7     2  2019   -600 TRUE       2     2 FALSE
#>  8     2  2020   -200 TRUE       3     3 TRUE 
#>  9     2  2021    300 FALSE      3     2 FALSE
#> 10     2  2022    400 FALSE      3     1 FALSE

And an easier solution to read, not generalisable to big spans:

library(tidyverse)
df %>% 
  group_by(ID) %>% 
  mutate(YN = (Amount < 0)   lead(Amount < 0, default = 0)   lead(Amount < 0, 2, default = 0) == 3) %>% 
  ungroup()

CodePudding user response:

Using the dataset

df <- data.frame(id = c(rep(1,5,),rep(2,5)), year = rep(seq(2022,2018),2), amount = c(2,-9,-10,-16,-20,400,300,-200,-600,-500))

A sapply function can be used:

df$test <- ifelse(sapply(seq(1,nrow(df)), function(x) all(subset(df, df$id == df$id[x] & df$year %in% seq(df$year[x]-2,df$year[x]))$amount < 0) & nrow(subset(df, df$id == df$id[x] & df$year %in% seq(df$year[x]-2,df$year[x]))) > 2),1,0)

That gives:

df
       id year amount test
    1   1 2022      2    0
    2   1 2021     -9    1
    3   1 2020    -10    1
    4   1 2019    -16    0
    5   1 2018    -20    0
    6   2 2022    400    0
    7   2 2021    300    0
    8   2 2020   -200    1
    9   2 2019   -600    0
    10  2 2018   -500    0

Breakdown:

ifelse to change the TRUE/FALSE from the SAPPLY to 1/0

sapply to iterate per row

all to test whether the condition holds for all values inside the subset

subset to select the last three years and the correct id

$amount < 0 to test the condition

CodePudding user response:

You could use data.table package.

library(data.table)

setDT(df)

df[order(Year), YN := match(Reduce(` `, shift(Amount<0, 0:2)), 3, 0), ID]

#        ID  Year Amount    YN
#  1:     1  2022      2     0
#  2:     1  2021     -9     1
#  3:     1  2020    -10     1
#  4:     1  2019    -16     0
#  5:     1  2018    -20     0
#  6:     2  2022    400     0
#  7:     2  2021    300     0
#  8:     2  2020   -200     1
#  9:     2  2019   -600     0
# 10:     2  2018   -500     0

data

df = structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), 
                    Year = c(2022, 2021, 2020, 2019, 2018, 2022, 2021, 2020, 2019, 2018), 
                    Amount = c(2, -9, -10, -16, -20, 400, 300, -200, -600, -500)), 
               row.names = c(NA, -10L), 
               class = "data.frame")
  •  Tags:  
  • r
  • Related