I'm trying to replace missing age values in one wave by adding 1 to the value from the previous wave. So, for instance:
ID | Age | Wave |
---|---|---|
1 | 20 | 1 |
1 | NA | 2 |
2 | 61 | 1 |
2 | NA | 2 |
would become
ID | Age | Wave |
---|---|---|
1 | 20 | 1 |
1 | 21 | 2 |
2 | 61 | 1 |
2 | 62 | 2 |
CodePudding user response:
library(tidyverse)
df %>%
mutate(Age = case_when(is.na(Age) ~ lag(Age) 1,
TRUE ~ Age))
# A tibble: 4 x 3
ID Age Wave
<dbl> <dbl> <dbl>
1 1 20 1
2 1 21 2
3 2 61 1
4 2 62 2
CodePudding user response:
Base R
> ave(df$Age,df$ID,FUN=function(x){x[1] seq_along(x)-1})
[1] 20 21 61 62
CodePudding user response:
With tidyverse, assuming your data is in df
dataframe:
library(tidyverse)
df %>%
group_by(ID) %>% arrange(ID, Wave) %>%
mutate(missing_grp = cumsum( (is.na(Age)!=is.na(lag(Age))) | !is.na(Age) )) %>%
group_by(ID, missing_grp) %>%
mutate(age_offset=cumsum(is.na(Age))) %>%
group_by(ID) %>%
fill(Age, .direction='down') %>%
mutate(Age = Age age_offset) %>%
ungroup() %>% select(-missing_grp, -age_offset)
It works also with multiple successive missing ages. For the following input:
df <- tribble(
~ID, ~Age, ~Wave,
1, 21, 1,
1, NA, 2,
2, 61, 1,
2, NA, 2,
2, NA, 3,
2, 70, 4,
2, NA, 5,
)
it returns:
# A tibble: 7 × 3
ID Age Wave
<dbl> <dbl> <dbl>
1 1 21 1
2 1 22 2
3 2 61 1
4 2 62 2
5 2 63 3
6 2 70 4
7 2 71 5
CodePudding user response:
In base R
within(df, Age[is.na(Age)] <- Age[which(is.na(Age)) - 1] 1)
#> ID Age Wave
#> 1 1 20 1
#> 2 1 21 2
#> 3 2 61 1
#> 4 2 62 2
CodePudding user response:
If you have more than two waves, we could use the row number:
library(dplyr)
library(tidyverse)
df |>
group_by(ID) |>
fill(Age) |>
mutate(Age = Age row_number() - 1) |>
ungroup()
Output:
# A tibble: 5 × 3
ID Age Wave
<dbl> <dbl> <dbl>
1 1 21 1
2 1 22 2
3 2 61 1
4 2 62 2
5 2 63 3