Home > Software engineering >  Fill NAs with 0 until first numeric observation and last value until end of DF
Fill NAs with 0 until first numeric observation and last value until end of DF

Time:10-14

I have a DF with account numbers and the balance on a monthly basis. For several months there is no value (NAs). I want to fill in all NAs with 0 until R finds the first month with a value. When found the first observation it should fill in the last value found in the DF. For the end of the DF I want to use the last found observation. There is a sample of the DF, to make it easier to understand

Test = structure(list(Account = c("9876", "9876", "9876", "9876", "9876", 
                                  "9876", "9876", "9876", "9876", "9876", "9876", "9876", "9876", 
                                  "9876", "9876", "9876", "9876", "9876", "9876", "9876", "9876", 
                                  "9876", "9876", "9876", "9876", "9876", "9876", "9876", "1234", 
                                  "1234", "1234", "1234", "1234", "1234", "1234", "1234", "1234", 
                                  "1234", "1234", "1234", "1234", "1234", "1234", "1234", "1234", 
                                  "1234", "1234", "1234", "1234", "1234", "1234", "1234", "1234", 
                                  "1234", "1234", "1234"), Date = structure(c(17409, 17439, 17470, 
                                                                              17500, 17531, 17562, 17590, 17621, 17651, 17682, 17712, 17743, 
                                                                              17774, 17804, 17835, 17865, 17896, 17927, 17955, 17986, 18016, 
                                                                              18047, 18077, 18108, 18139, 18169, 18200, 18230, 17409, 17439, 
                                                                              17470, 17500, 17531, 17562, 17590, 17621, 17651, 17682, 17712, 
                                                                              17743, 17774, 17804, 17835, 17865, 17896, 17927, 17955, 17986, 
                                                                              18016, 18047, 18077, 18108, 18139, 18169, 18200, 18230), class = "Date"), 
                      Balance = c(NA, NA, NA, 0, NA, -0.0025, -0.0025, NA, 0, 0, 
                                  NA, NA, 0, NA, 0, NA, NA, 0, NA, 0, 0, 0, NA, NA, NA, NA, 
                                  0, 0, NA, -2097.774, -2097.774, NA, NA, -3339.004, NA, NA, 
                                  NA, -5791.112, NA, NA, 0, 0, 0, NA, NA, 0, 0, 0, NA, 0, -90.30116, 
                                  -90.30116, NA, NA, NA, -474.4858), `First Observation` = c(NA, 
                                                                                             NA, NA, 1L, NA, 0L, 0L, NA, 0L, 0L, NA, NA, 0L, NA, 0L, NA, 
                                                                                             NA, 0L, NA, 0L, 0L, 0L, NA, NA, NA, NA, 0L, 0L, NA, 1L, 0L, 
                                                                                             NA, NA, 0L, NA, NA, NA, 0L, NA, NA, 0L, 0L, 0L, NA, NA, 0L, 
                                                                                             0L, 0L, NA, 0L, 0L, 0L, NA, NA, NA, 0L)), class = c("data.table", 
                                                                                                                                                 "data.frame"), row.names = c(NA, -56L))
                 

Account number 9876:

So my desired output is that I get 0s in the balance for Date 2017-08 until 2017-10 for until the first observation on the 2017-11 is reached 2017-12 I would like to get the observation before (2017-11) with a balance of 0 2018-03, 2018-06, 2018-07 I would like to get the observation before (2018-02 and 2018-05) with a balance of -0.00250 From 2019-11 until the end of my time series I would like to get the last observation (here it is 2019-11, which is as well the end of my time series, but I cut the data. The original one reaches until 2022-08)

Account 1234:

For 2017-08 I would like to get 0 until I reach my first observation with a value For 2017-11 and 12 I would like to get -2097.774 (which is the balance of 2017-10)

So basically I would like to have 0s for the beginning of my time series in the balance until I reach my first account balance, then take the last known balance for all other dates and then have the same value of my last balance until I reach the end of my time series.

Thank You!

CodePudding user response:

The function fill from the tidyr package allows you to fill missing values with the preceding or following entry.

After filling the values downwards, NAs before the first observation can then be replace with 0 using replace_na.

library(tidyr)
library(dplyr)

Test %>%
  group_by(Account) %>%
  fill(Balance, .direction = 'down') %>%
  replace_na(list(Balance = 0))
  •  Tags:  
  • r
  • Related