I have a dataset that's columns are like this:
Forecaster CountryName ProductName X2022 X2023 file_month
Allianz USA Inflation 6.1 2.5 7
EIU USA Inflation 7.7 3.7 7
Oxford USA Inflation 6.5 2.9 7
Allianz USA Inflation NA 3.3 8
EIU USA Inflation 8.3 3.7 8
Data is in long format, file_month shows the month that the data was made. Now i need to create new column that refers to updated values by Forecaster, CountryName, ProductName and either X2022 or X2023. So in this new column it should be checked if Forecaster revised it's forecast compared to one month before and if so, the new revised value should be written in the column.
Also, as there is more than one ProductName, CountryName and Forecaster in whole dataset, I would like to know how could i make this work for every of these groups.
New data would need to look something like this
Forecaster CountryName ProductName X2022 X2023 file_month update_2022 update_2023
Allianz USA Inflation 6.1 2.5 7
EIU USA Inflation 7.7 3.7 7
Oxford USA Inflation 6.5 2.9 7
Allianz USA Inflation NA 3.3 8 NA 3.3
EIU USA Inflation 8.3 3.7 8 8.3 NA
For code reproducability i add dput() output.
data <- structure(list(Forecaster = c("Moody's Analytics", "Oxford Economics",
"Julius Baer", "Fitch Solutions", "SEB", "Allianz", "EIU", "FrontierView",
"Euromonitor Int.", "Fitch Ratings", "Scope Ratings", "Allianz",
"EIU", "Euromonitor Int.", "Fitch Ratings", "Fitch Solutions",
"FrontierView", "Julius Baer", "Moody's Analytics", "Oxford Economics",
"Scope Ratings", "SEB", "Allianz", "EIU", "Euromonitor Int.",
"Fitch Ratings", "Fitch Solutions", "FrontierView", "Julius Baer",
"Moody's Analytics", "Oxford Economics", "Scope Ratings", "SEB",
"Allianz", "EIU", "Euromonitor Int.", "Fitch Ratings", "Fitch Solutions",
"FrontierView", "Julius Baer", "Moody's Analytics", "Oxford Economics",
"Scope Ratings", "SEB", "Allianz", "EIU", "Euromonitor Int.",
"Fitch Ratings", "Fitch Solutions", "FrontierView", "Julius Baer",
"Moody's Analytics", "Oxford Economics", "Scope Ratings"),
CountryName = c("USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA"),
ProductName = c("Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation"),
X2022 = c(6.58180388987726,
7.39615, 5.71530516215215, 6.5, NA, 6.1, 7.2, 6.2, 7.31, 6.9,
NA, 6.1, 7.7, 7.49, 6.9, 6.5, 6.2, 7.01535416655227, 6.94177167480419,
7.40344, NA, 7.5, 6.1, 7.7, 7.49, 6.9, 6.5, 6.2, 7.01535416655227,
6.94177167480419, 7.40344, NA, 7.5, NA, 8.3, 7.99, 7.8313, 7.3,
8.4, 7.45242025035952, 7.51681072158563, 8.04052, NA, 7.5, 8.2,
8.1, 8.01, 7.8313, 7.7, 8.6, 7.94553624147252, 7.82785941572661,
8.11193, NA),
X2023 = c(2.62716826359282, 1.73992, 2.32590118928029,
2.5, NA, 2.5, 2.8, 3.3, 4.02, 2.7, NA, 2.5, 3.7, 4.01, 2.7, 2.5,
3.3, 2.96909559451206, 2.70488419782626, 2.01984, NA, 4.2, 2.5,
3.7, 4.01, 2.7, 2.5, 3.3, 2.96909559451206, 2.70488419782626,
2.01984, NA, 4.2, NA, 3.7, 4.21, 3.7199, 3.5, 4.9, 3.02315479834998,
3.10232274684858, 3.17401, NA, 4.2, 3.2, 3.8, 4, 3.7199, 3.7,
5.7, 3.48906158848119, 3.36435085973137, 3.27724, NA),
file_month = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L)), class = "data.frame", row.names = c(NA,-54L))
CodePudding user response:
Here are some notes about how things work here:
- First we group the data set on Forecaster variable. For this purpose we use
dplyr::group_split
function that creates a list where each elements is your data set with a unique Forecaster value. - Second we use
purrr::map_dfr
to apply a function on each elements of the previous list. In a way that we create a custom function and apply it on each data frame.dfr
suffice implies that in the end themap
binds all the modified data frames together by row. - In this case we could consider a rolling function. There are various packages that you could use for this purpose such as
zoo
,runner
,slider
and etc. I used slider because it works well withtidyverse
functions. - I used
slider::slide_dbl
function where it takes a given function and apply it on a sliding window of the size that we choose. Here we would like to compare each value corresponding to a month to its previous value so the size of our sliding window is 2 as specified by.before
argument. - Then I wrote a custom function which is in fact a series of
if
,else
clause to apply what you have in mind. - I then set
.complete
argument inslider_dbl
toTRUE
, so that the function is only applied on a complete window. It means for the first month that appears in the data set there is no output value cause there is no previous month. - In the end I would like to apply the entire
slider_dbl
function on 2 columns of each data frame so I used it insidedplyer::across
.
library(tidyverse)
library(slider)
data %>%
group_split(Forecaster) %>%
map_dfr(~ .x %>%
mutate(across(c(X2022, X2023), ~ slide_dbl(., .f = ~ if(is.na(.x[2])) {
.x[2]
} else if(is.na(.x[1]) & !is.na(.x[2])){
.x[2]
} else if((.x[1] - .x[2]) != 0) {
.x[2]
} else {
NA_real_
}, .before = 1, .complete = TRUE), .names = 'Updated_{.col}')))
# A tibble: 54 × 8
Forecaster CountryName ProductName X2022 X2023 file_month Updated_X2022 Updated_X2023
<chr> <chr> <chr> <dbl> <dbl> <int> <dbl> <dbl>
1 Allianz USA Inflation 6.1 2.5 5 NA NA
2 Allianz USA Inflation 6.1 2.5 6 NA NA
3 Allianz USA Inflation 6.1 2.5 7 NA NA
4 Allianz USA Inflation NA NA 8 NA NA
5 Allianz USA Inflation 8.2 3.2 9 8.2 3.2
6 EIU USA Inflation 7.2 2.8 5 NA NA
7 EIU USA Inflation 7.7 3.7 6 7.7 3.7
8 EIU USA Inflation 7.7 3.7 7 NA NA
9 EIU USA Inflation 8.3 3.7 8 8.3 NA
10 EIU USA Inflation 8.1 3.8 9 8.1 3.8
# … with 44 more rows
I suggest you run this code line by line to check the output and get a sense of what is going on here.