I have a the following dataframe
df
id year certificate
1 2000 1
2 2003 1
3 2002 1
4 2004 1
I want to make it a long dataframe with indicator from "year" and on
df_long
id year certificate
1 2000 1
1 2001 1
1 2002 1
1 2003 1
1 2004 1
2 2000 NA
2 2001 NA
2 2002 NA
2 2003 1
2 2004 1
CodePudding user response:
Two steps, first create all the combinations
tmp=merge(
df,
expand.grid("year"=2000:2004,"id"=1:4),
all=T
)
then fill in the missing values
tmp$certificate[is.na(tmp$certificate)]=0
tmp$certificate2=ave(tmp$certificate,tmp$id,FUN=cumsum)
...
id year certificate certificate2
11 3 2000 0 0
12 3 2001 0 0
13 3 2002 1 1
14 3 2003 0 1
15 3 2004 0 1
16 4 2000 0 0
17 4 2001 0 0
18 4 2002 0 0
19 4 2003 0 0
20 4 2004 1 1
CodePudding user response:
A tidyverse solution. First expand the data.frame to cover all years and ids and then marks as 1
the certificate
column in rows after the first year where certificate == 1
.
library(tibble)
library(dplyr)
library(purrr)
df <- tribble(~id, ~year, ~certificate,
1, 2000, 1,
2, 2003, 1,
3, 2002, 1,
4, 2004, 1)
df |>
right_join(crossing(id = 1:4, year = 2000:2004)) |>
group_by(id) |>
arrange(year) |>
mutate(certificate = accumulate(certificate,
~if(is.na(.x)) .y else .x)) |>
arrange(id, year)
##> # A tibble: 20 × 3
##> # Groups: id [4]
##> id year certificate
##> <dbl> <dbl> <dbl>
##> 1 1 2000 1
##> 2 1 2001 1
##> 3 1 2002 1
##> 4 1 2003 1
##> 5 1 2004 1
##> 6 2 2000 NA
##> 7 2 2001 NA
##> 8 2 2002 NA
##> 9 2 2003 1
##> 10 2 2004 1
##> 11 3 2000 NA
##> 12 3 2001 NA
##> 13 3 2002 1
##> 14 3 2003 1
##> 15 3 2004 1
##> 16 4 2000 NA
##> 17 4 2001 NA
##> 18 4 2002 NA
##> 19 4 2003 NA
##> 20 4 2004 1