I have a dataframe like this one with a start and end month and year.
ID start_month start_year end_month end_year
1 1 2018 5 2019
2 5 1981 NA 1999
2 7 1973 NA 1981
2 7 1963 NA 1973
I have several missing data for the months and would like to be able to replace them with values and have the dates follow each other.
I would like to replace the NA with the start month of the row before - 1, based on the ID
.
For the date NA-1999 as it is the most recent date in subject 2 and there is no date after that, I would like to put a 7 for the month.
I would like to get something like this:
ID start_month start_year end_month end_year
1 1 2018 5 2019
2 5 1981 7 1999
2 7 1973 4 1981
2 7 1963 6 1973
I thought of using this:
df<-df %>% group_by(ID) %>% replace(end_month = ifelse(is.na(end_month), length(start_month)-1 , 7)) %>% ungroup()
My " length(start_month)-1" argument and the replace function doesn't work and I don't know what else to do
I'm sorry if this isn't very clear, it's complicated to explain this in writing...
Thank you in advance for your help
CodePudding user response:
If I understand you correctly, you want to replace NA
s in end_month
within the same ID
by the following rules:
start_month - 1
for any period which has a later period- 7 for the last period in each
ID
Is that correct?
If so, then this should do the trick:
library(dplyr)
df %>%
group_by(ID) %>%
arrange(ID, desc(start_year), desc(start_month)) %>%
mutate(
end_month = ifelse(is.na(end_month), lag(start_month) - 1, end_month),
end_month = ifelse(is.na(end_month), 7, end_month)
) %>%
ungroup()
#> # A tibble: 4 × 5
#> ID start_month start_year end_month end_year
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 2018 5 2019
#> 2 2 5 1981 7 1999
#> 3 2 7 1973 4 1981
#> 4 2 7 1963 6 1973
Created on 2022-03-30 by the reprex package (v2.0.1)
Data
df <- tibble::tribble(
~ID, ~start_month, ~start_year, ~end_month, ~end_year,
1, 1, 2018, 5, 2019,
2, 5, 1981, NA, 1999,
2, 7, 1973, NA, 1981,
2, 7, 1963, NA, 1973
)
df
#> # A tibble: 4 × 5
#> ID start_month start_year end_month end_year
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 2018 5 2019
#> 2 2 5 1981 NA 1999
#> 3 2 7 1973 NA 1981
#> 4 2 7 1963 NA 1973