I have a dataset that logically shouldn't have any negative values, but because of revisions to the data, it sometimes does. To adjust for this, I want to distribute negative values to the previous day's data.
For example, I'd like the values to change like so:
Old | New |
---|---|
18 | 18 |
12 | 10 |
1 | 0 |
0 | 0 |
0 | 0 |
-3 | 0 |
1 | 1 |
0 | 0 |
0 | 0 |
I'm trying to do this programmatically in R, but haven't been able to make it work.
Here's my latest attempt, which kinda works, but only partially and feels unstable:
library(tidyverse)
library(lubridate)
# Data
nums <- as_tibble(list(
value = c(18, 12, 1, 0, 0, -3, 1, 0, 0 ),
date = c("2022-01-01","2022-01-02","2022-01-03",
"2022-01-04","2022-01-05","2022-01-06",
"2022-01-07","2022-01-08","2022-01-09"))) |>
mutate(date = ymd(date)) %>%
select(date,value)
# Remove negatives
no_negs <- nums |>
arrange(desc(date)) |>
mutate(neww = if_else(lag(value,1) < 0, value lag(value,1),value)) |>
arrange(date) |>
mutate(neww = ifelse(is.na(neww), 0, neww))
Any thoughts on whether this is possible, and if so, how?
Thanks!
Edit: new example data
CodePudding user response:
Edits: reduction of code, fix for consecutive zeroes, use second data set, general improvement.
We can Reduce
it:
nums1 %>%
mutate(
add = Reduce(function(prev, this) min(this prev, 0),
value, init = 0, accumulate = TRUE, right = TRUE)[-1],
value2 = pmax(value add, 0)
) %>%
select(-add)
# # A tibble: 6 x 3
# date value value2
# <date> <dbl> <dbl>
# 1 2022-01-01 18 18
# 2 2022-01-02 12 5
# 3 2022-01-03 -3 0
# 4 2022-01-04 -4 0
# 5 2022-01-05 5 3
# 6 2022-01-06 -2 0
and the second dataset with more consecutive zeroes:
nums2 %>%
mutate(
add = Reduce(function(prev, this) min(this prev, 0),
value, init = 0, accumulate = TRUE, right = TRUE)[-1],
value2 = pmax(value add, 0)
) %>%
select(-add)
# # A tibble: 9 x 3
# date value value2
# <date> <dbl> <dbl>
# 1 2022-01-01 18 18
# 2 2022-01-02 12 10
# 3 2022-01-03 1 0
# 4 2022-01-04 0 0
# 5 2022-01-05 0 0
# 6 2022-01-06 -3 0
# 7 2022-01-07 1 1
# 8 2022-01-08 0 0
# 9 2022-01-09 0 0
Data
nums1 <- structure(list(date = structure(c(18993, 18994, 18995, 18996, 18997, 18998), class = "Date"), value = c(18, 12, -3, -4, 5, -2)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
nums2 <- structure(list(date = structure(c(18993, 18994, 18995, 18996, 18997, 18998, 18999, 19000, 19001), class = "Date"), value = c(18, 12, 1, 0, 0, -3, 1, 0, 0)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"))