I´m trying to obtain an formula using dplyr::mutate in R to show me if the El Niño or if the La Niña had occurred before in my dataframe.
The rule decision is:
If $TMA_{t-1} > 0.5 ,, \mbox{and} ,, TMA_{t-2} > 0.5 \mbox{and} ,, TMA_{t-3} > 0.5 \mbox{and} ,, TMA_{t-4} > 0.5 \mbox{and} ,, TMA_{t-5} > 0.5 \mbox{then} ,, \mbox{"El Niño"} $
else if
$TMA_{t-1} < -0.5 ,, \mbox{and} ,, TMA_{t-2} < -0.5 \mbox{and} ,, TMA_{t-3} < -0.5 \mbox{and} ,, TMA_{t-4} < -0.5 \mbox{and} ,, TMA_{t-5} < -0.5 \mbox{then} ,, \mbox{"La Niña"}$
else if none of the above happens then leave in blank.
More specific we have:
If 5 latest consecutives TMA > 0.5 happens, then "El Niño", otherwise if the last five consecutives TMA < -0.5 happens, then "La Niña". And if none of the above possibilities are checked, then leave blank (NA or NULL, for example)
This is an small view of my issue that I have founded solution in a spreadsheet:
Excel formula for rule of decision characterization
In portuguese language =SE(E means =IF(AND ...
In a dataframe in R we can do:
library(dplyr)
library(fpp3)
dates <- yearmonth(c(
"2018-02",
"2018-03",
"2018-04",
"2018-05",
"2018-06",
"2018-07",
"2018-08",
"2018-09",
"2018-10",
"2018-11",
"2018-12",
"2019-01",
"2019-02",
"2019-03",
"2019-04",
"2019-05",
"2018-06"
))
TMA <- c(
-0.85,
-0.69,
-0.50,
-0.22,
-0.01,
0.09,
0.23,
0.49,
0.76,
0.90,
0.82,
0.75,
0.73,
0.72,
0.66,
0.54,
0.45
)
df <- data.frame(dates, TMA)
df <- df %>%
mutate(
´Climatic Condition´=
# The conditional statement that I had wrote above... (HELP!)
)
How can I complete the Climatic Condition
inside dplyr::mutate in R ?
CodePudding user response:
You can use zoo
's rolling operation.
library(dplyr)
library(zoo)
df %>%
mutate(climatic_condition = lag(case_when(
rollapplyr(TMA < -0.5, 5, all, fill = FALSE) ~ "La Niña",
rollapplyr(TMA > 0.5, 5, all, fill = FALSE) ~ "El Niño")
))
# dates TMA climatic_condition
#1 2018 Feb -0.85 <NA>
#2 2018 Mar -0.69 <NA>
#3 2018 Apr -0.50 <NA>
#4 2018 May -0.22 <NA>
#5 2018 Jun -0.01 <NA>
#6 2018 Jul 0.09 <NA>
#7 2018 Aug 0.23 <NA>
#8 2018 Sep 0.49 <NA>
#9 2018 Oct 0.76 <NA>
#10 2018 Nov 0.90 <NA>
#11 2018 Dec 0.82 <NA>
#12 2019 Jan 0.75 <NA>
#13 2019 Feb 0.73 <NA>
#14 2019 Mar 0.72 El Niño
#15 2019 Apr 0.66 El Niño
#16 2019 May 0.54 El Niño
#17 2018 Jun 0.45 El Niño
CodePudding user response:
You could use
library(dplyr)
df %>%
mutate(condition = case_when(
lag(TMA) > 0.5 & lag(TMA, 2) > 0.5 & lag(TMA, 3) > 0.5 & lag(TMA, 4) > 0.5 & lag(TMA, 5) > 0.5 ~ "El Niño",
lag(TMA) < -0.5 & lag(TMA, 2) < -0.5 & lag(TMA, 3) < -0.5 & lag(TMA, 4) < -0.5 & lag(TMA, 5) < -0.5 ~ "La Niña")
)
This returns
dates TMA condition
1 2018-02-01 -0.85 <NA>
2 2018-03-01 -0.69 <NA>
3 2018-04-01 -0.50 <NA>
4 2018-05-01 -0.22 <NA>
5 2018-06-01 -0.01 <NA>
6 2018-07-01 0.09 <NA>
7 2018-08-01 0.23 <NA>
8 2018-09-01 0.49 <NA>
9 2018-10-01 0.76 <NA>
10 2018-11-01 0.90 <NA>
11 2018-12-01 0.82 <NA>
12 2019-01-01 0.75 <NA>
13 2019-02-01 0.73 <NA>
14 2019-03-01 0.72 El Niño
15 2019-04-01 0.66 El Niño
16 2019-05-01 0.54 El Niño
17 2018-06-01 0.45 El Niño
There are more sophisticated ways but this is an easy approach.