Home > OS >  Creating new column based on minimum value on another column - R
Creating new column based on minimum value on another column - R

Time:02-08

So I have a dataframe structured like this:

df <- data.frame("id" = c(rep("a",4),rep("b",4)),
                 "Year" = c(2020,2019,2018,2017,
                            2020,2019,2018,2017),
                 "value" = c(30,20,0,0,
                             70,50,30,0))


> df
  id Year value
1  a 2020    30
2  a 2019    20
3  a 2018     0
4  a 2017     0
5  b 2020    70
6  b 2019    50
7  b 2018    30
8  b 2017     0

What I want to do is create a new column which has the same values as the value column, except wherever there is a 0 value it looks at the closest year with a non-zero value and applies that value to all 0 rows by each id. So the output should be:

> df
  id Year value newoutput
1  a 2020    30        30
2  a 2019    20        20
3  a 2018     0        20
4  a 2017     0        20
5  b 2020    70        70
6  b 2019    50        50
7  b 2018    30        30
8  b 2017     0        30

So for id a we see that years 2018, 2017 both have 0 values so need to be amended. The next year which has a non zero value is 2019, so we take the value in that year which is 20 and apply it to both 2018, 2017. Similar for id b.

Any ideas on how to do this using dplyr?

CodePudding user response:

Use fill with na_if value is 0.

library(tidyr)
library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(newoutput = na_if(value, 0)) %>% 
  fill(newoutput)

  id     Year value newoutput
  <chr> <dbl> <dbl>     <dbl>
1 a      2020    30        30
2 a      2019    20        20
3 a      2018     0        20
4 a      2017     0        20
5 b      2020    70        70
6 b      2019    50        50
7 b      2018    30        30
8 b      2017     0        30

CodePudding user response:

A possible solution, based on dplyr and cummax:

library(dplyr)

df %>% 
  group_by(id) %>% 
  mutate(newoutput = value   cummax((value == 0) * lag(value, default = T))) %>% 
  ungroup

#> # A tibble: 8 × 4
#>   id     Year value newoutput
#>   <chr> <dbl> <dbl>     <dbl>
#> 1 a      2020    30        30
#> 2 a      2019    20        20
#> 3 a      2018     0        20
#> 4 a      2017     0        20
#> 5 b      2020    70        70
#> 6 b      2019    50        50
#> 7 b      2018    30        30
#> 8 b      2017     0        30
  •  Tags:  
  • Related