So I have a dataframe structured like this:
df <- data.frame("id" = c(rep("a",4),rep("b",4)),
"Year" = c(2020,2019,2018,2017,
2020,2019,2018,2017),
"value" = c(30,20,0,0,
70,50,30,0))
> df
id Year value
1 a 2020 30
2 a 2019 20
3 a 2018 0
4 a 2017 0
5 b 2020 70
6 b 2019 50
7 b 2018 30
8 b 2017 0
What I want to do is create a new column which has the same values as the value
column, except wherever there is a 0
value it looks at the closest year with a non-zero value and applies that value to all 0
rows by each id
. So the output should be:
> df
id Year value newoutput
1 a 2020 30 30
2 a 2019 20 20
3 a 2018 0 20
4 a 2017 0 20
5 b 2020 70 70
6 b 2019 50 50
7 b 2018 30 30
8 b 2017 0 30
So for id
a
we see that years 2018, 2017
both have 0
values so need to be amended. The next year which has a non zero value is 2019
, so we take the value in that year which is 20
and apply it to both 2018, 2017
. Similar for id
b
.
Any ideas on how to do this using dplyr
?
CodePudding user response:
Use fill
with na_if
value
is 0.
library(tidyr)
library(dplyr)
df %>%
group_by(id) %>%
mutate(newoutput = na_if(value, 0)) %>%
fill(newoutput)
id Year value newoutput
<chr> <dbl> <dbl> <dbl>
1 a 2020 30 30
2 a 2019 20 20
3 a 2018 0 20
4 a 2017 0 20
5 b 2020 70 70
6 b 2019 50 50
7 b 2018 30 30
8 b 2017 0 30
CodePudding user response:
A possible solution, based on dplyr
and cummax
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(newoutput = value cummax((value == 0) * lag(value, default = T))) %>%
ungroup
#> # A tibble: 8 × 4
#> id Year value newoutput
#> <chr> <dbl> <dbl> <dbl>
#> 1 a 2020 30 30
#> 2 a 2019 20 20
#> 3 a 2018 0 20
#> 4 a 2017 0 20
#> 5 b 2020 70 70
#> 6 b 2019 50 50
#> 7 b 2018 30 30
#> 8 b 2017 0 30