In the data below we observe a virtual GDP growth of a certain country over time. My aim is to create a variable with three categories: 0= no crisis, 1= crisis, 2= severe crisis. That would be identify economic crises as years where the growth rate reaches at least one (crisis) or two (severe) standard deviations below the average of the previous 3-year growth trend.
Could give some guidance please?
growth year
5 1990
4 1991
0 1992
-4 1993
-3 1994
-1 1995
2 1996
4 1997
7 1998
10 1999
8 2000
-10 2001
-8 2002
2 2003
4 2004
5 2005
8 2006
4 2007
-10 2008
-9 2009
-8 2010
-3 2011
0 2012
-5 2013
-6 2014
-2 2015
4 2016
5 2017
5 2018
8 2019
2 2020
-1 2021
-1 2022
Here is the data:
df=structure(list(gdp_growth = c(5, 4, 0, -4, -3, -1, 2, 4, 7, 10,
8, -10, -8, 2, 4, 5, 8, 4, -10, -9, -8, -3, 0, -5, -6, -2, 4,
5, 5, 8, 2, -1, -1), year = c(1990, 1991, 1992, 1993, 1994, 1995,
1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006,
2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018, 2019, 2020, 2021, 2022)), row.names = c(NA, -33L), class = "data.frame")
CodePudding user response:
From your description, it sounds as though you first need to calculate the rolling mean of growth, then compare the current year's growth to this:
library(dplyr)
df %>%
mutate(mn = zoo::rollmean(gdp_growth, 3, na.pad = TRUE, align = 'right'),
crisis = ifelse(gdp_growth < (mn - sd(gdp_growth)),
ifelse(gdp_growth < (mn - 2 * sd(gdp_growth)),
2, 1), 0)) %>%
select(-mn)
#> gdp_growth year crisis
#> 1 5 1990 NA
#> 2 4 1991 NA
#> 3 0 1992 0
#> 4 -4 1993 0
#> 5 -3 1994 0
#> 6 -1 1995 0
#> 7 2 1996 0
#> 8 4 1997 0
#> 9 7 1998 0
#> 10 10 1999 0
#> 11 8 2000 0
#> 12 -10 2001 2
#> 13 -8 2002 0
#> 14 2 2003 0
#> 15 4 2004 0
#> 16 5 2005 0
#> 17 8 2006 0
#> 18 4 2007 0
#> 19 -10 2008 1
#> 20 -9 2009 0
#> 21 -8 2010 0
#> 22 -3 2011 0
#> 23 0 2012 0
#> 24 -5 2013 0
#> 25 -6 2014 0
#> 26 -2 2015 0
#> 27 4 2016 0
#> 28 5 2017 0
#> 29 5 2018 0
#> 30 8 2019 0
#> 31 2 2020 0
#> 32 -1 2021 0
#> 33 -1 2022 0
CodePudding user response:
Here's another example, this time using RcppRoll
package which has loads of fast rolling functions compatible with dplyr
.
library(dplyr)
df %>%
mutate(
std3 = RcppRoll::roll_sd(gdp_growth , 3, fill=0, align = "right"),
crisis = case_when(
std3 < 1 ~ 'no crisis',
std3 < 2 ~ 'crisis',
T ~ 'severe crisis'
)
)
#> gdp_growth year std3 crisis
#> 1 5 1990 0.0000000 no crisis
#> 2 4 1991 0.0000000 no crisis
#> 3 0 1992 2.6457513 severe crisis
#> 4 -4 1993 4.0000000 severe crisis
#> 5 -3 1994 2.0816660 severe crisis
#> 6 -1 1995 1.5275252 crisis
#> 7 2 1996 2.5166115 severe crisis
#> 8 4 1997 2.5166115 severe crisis
#> 9 7 1998 2.5166115 severe crisis
#> 10 10 1999 3.0000000 severe crisis
#> 11 8 2000 1.5275252 crisis
#> 12 -10 2001 11.0151411 severe crisis
#> 13 -8 2002 9.8657657 severe crisis
#> 14 2 2003 6.4291005 severe crisis
#> 15 4 2004 6.4291005 severe crisis
#> 16 5 2005 1.5275252 crisis
#> 17 8 2006 2.0816660 severe crisis
#> 18 4 2007 2.0816660 severe crisis
#> 19 -10 2008 9.4516313 severe crisis
#> 20 -9 2009 7.8102497 severe crisis
#> 21 -8 2010 1.0000000 crisis
#> 22 -3 2011 3.2145503 severe crisis
#> 23 0 2012 4.0414519 severe crisis
#> 24 -5 2013 2.5166115 severe crisis
#> 25 -6 2014 3.2145503 severe crisis
#> 26 -2 2015 2.0816660 severe crisis
#> 27 4 2016 5.0332230 severe crisis
#> 28 5 2017 3.7859389 severe crisis
#> 29 5 2018 0.5773503 no crisis
#> 30 8 2019 1.7320508 crisis
#> 31 2 2020 3.0000000 severe crisis
#> 32 -1 2021 4.5825757 severe crisis
#> 33 -1 2022 1.7320508 crisis
Created on 2022-07-11 by the reprex package (v2.0.1)
CodePudding user response:
You could use lag
, rowwise
*, and mutate
within dplyr
:
library(dplyr)
df |>
mutate(gdp3_growth_lag1 = lag(gdp_growth, 1),
gdp3_growth_lag2 = lag(gdp_growth, 2),
gdp3_growth_lag3 = lag(gdp_growth, 3)) |>
rowwise() |>
mutate(
gdp3_growth_mean = mean(c_across(starts_with("gdp3_growth_lag"))),
gdp3_growth_sd = sd(c_across(starts_with("gdp3_growth_lag")))
) |>
ungroup() |>
mutate(crisis = case_when(gdp_growth <= gdp3_growth_mean - 2 * gdp3_growth_sd ~ 2,
gdp_growth <= gdp3_growth_mean - gdp3_growth_sd ~ 1,
is.na(gdp3_growth_mean) ~ NA_real_,
TRUE ~ 0)) |>
select(-starts_with("gdp3"))
Output:
# A tibble: 33 × 3
gdp_growth year crisis
<dbl> <dbl> <dbl>
1 5 1990 NA
2 4 1991 NA
3 0 1992 NA
4 -4 1993 2
5 -3 1994 0
6 -1 1995 0
7 2 1996 0
8 4 1997 0
9 7 1998 0
10 10 1999 0
11 8 2000 0
12 -10 2001 2
13 -8 2002 0
14 2 2003 0
15 4 2004 0
16 5 2005 0
17 8 2006 0
18 4 2007 0
19 -10 2008 2
20 -9 2009 1
21 -8 2010 0
22 -3 2011 0
23 0 2012 0
24 -5 2013 0
25 -6 2014 1
26 -2 2015 0
27 4 2016 0
28 5 2017 0
29 5 2018 0
30 8 2019 0
31 2 2020 2
32 -1 2021 2
33 -1 2022 0
Updated with full output.
(*) There is also rowSds
in matrixStats