Home > OS >  replace NA with value of the previous row
replace NA with value of the previous row

Time:03-30

I have a dataframe like this one with a start and end month and year.

ID  start_month start_year  end_month   end_year
1   1   2018    5   2019
2   5   1981    NA  1999
2   7   1973    NA  1981
2   7   1963    NA  1973

I have several missing data for the months and would like to be able to replace them with values and have the dates follow each other. I would like to replace the NA with the start month of the row before - 1, based on the ID.

For the date NA-1999 as it is the most recent date in subject 2 and there is no date after that, I would like to put a 7 for the month.

I would like to get something like this:

ID  start_month start_year  end_month   end_year
1   1   2018    5   2019
2   5   1981    7   1999
2   7   1973    4   1981
2   7   1963    6   1973

I thought of using this:

df<-df %>% group_by(ID) %>% replace(end_month = ifelse(is.na(end_month), length(start_month)-1 , 7)) %>% ungroup()

My " length(start_month)-1" argument and the replace function doesn't work and I don't know what else to do

I'm sorry if this isn't very clear, it's complicated to explain this in writing...

Thank you in advance for your help

CodePudding user response:

If I understand you correctly, you want to replace NAs in end_month within the same ID by the following rules:

  • start_month - 1 for any period which has a later period
  • 7 for the last period in each ID

Is that correct?

If so, then this should do the trick:

library(dplyr)

df %>% 
  group_by(ID) %>% 
  arrange(ID, desc(start_year), desc(start_month)) %>% 
  mutate(
    end_month = ifelse(is.na(end_month), lag(start_month) - 1, end_month),
    end_month = ifelse(is.na(end_month), 7, end_month)
  ) %>% 
  ungroup()

#> # A tibble: 4 × 5
#>      ID start_month start_year end_month end_year
#>   <dbl>       <dbl>      <dbl>     <dbl>    <dbl>
#> 1     1           1       2018         5     2019
#> 2     2           5       1981         7     1999
#> 3     2           7       1973         4     1981
#> 4     2           7       1963         6     1973

Created on 2022-03-30 by the reprex package (v2.0.1)

Data

df <- tibble::tribble(
~ID,  ~start_month, ~start_year,  ~end_month,   ~end_year,
1,   1,   2018,    5,   2019,
2,   5,   1981,    NA,  1999,
2,   7,   1973,    NA,  1981,
2,   7,   1963,    NA,  1973
)

df
#> # A tibble: 4 × 5
#>      ID start_month start_year end_month end_year
#>   <dbl>       <dbl>      <dbl>     <dbl>    <dbl>
#> 1     1           1       2018         5     2019
#> 2     2           5       1981        NA     1999
#> 3     2           7       1973        NA     1981
#> 4     2           7       1963        NA     1973

  • Related