Home > Software engineering >  R: "long" data frame to longer data frame with running indicator
R: "long" data frame to longer data frame with running indicator

Time:05-25

I have a the following dataframe

df
id   year   certificate
 1   2000   1
 2   2003   1
 3   2002   1
 4   2004   1

I want to make it a long dataframe with indicator from "year" and on

df_long
id   year   certificate
 1   2000   1
 1   2001   1
 1   2002   1
 1   2003   1
 1   2004   1
 2   2000   NA
 2   2001   NA
 2   2002   NA
 2   2003   1
 2   2004   1

CodePudding user response:

Two steps, first create all the combinations

tmp=merge(
  df,
  expand.grid("year"=2000:2004,"id"=1:4),
  all=T
)

then fill in the missing values

tmp$certificate[is.na(tmp$certificate)]=0
tmp$certificate2=ave(tmp$certificate,tmp$id,FUN=cumsum)

...
   id year certificate certificate2
11  3 2000           0            0
12  3 2001           0            0
13  3 2002           1            1
14  3 2003           0            1
15  3 2004           0            1
16  4 2000           0            0
17  4 2001           0            0
18  4 2002           0            0
19  4 2003           0            0
20  4 2004           1            1

CodePudding user response:

A tidyverse solution. First expand the data.frame to cover all years and ids and then marks as 1 the certificate column in rows after the first year where certificate == 1.

library(tibble)
library(dplyr)
library(purrr)

df <- tribble(~id,   ~year,  ~certificate,
              1,   2000,   1,
              2,   2003,   1,
              3,   2002,   1,
              4,   2004,   1)

df |>
    right_join(crossing(id = 1:4, year = 2000:2004)) |>
    group_by(id) |>
    arrange(year) |>
    mutate(certificate = accumulate(certificate,
                                    ~if(is.na(.x)) .y else .x)) |>
    arrange(id, year)

##> # A tibble: 20 × 3
##> # Groups:   id [4]
##>       id  year certificate
##>    <dbl> <dbl>       <dbl>
##>  1     1  2000           1
##>  2     1  2001           1
##>  3     1  2002           1
##>  4     1  2003           1
##>  5     1  2004           1
##>  6     2  2000          NA
##>  7     2  2001          NA
##>  8     2  2002          NA
##>  9     2  2003           1
##> 10     2  2004           1
##> 11     3  2000          NA
##> 12     3  2001          NA
##> 13     3  2002           1
##> 14     3  2003           1
##> 15     3  2004           1
##> 16     4  2000          NA
##> 17     4  2001          NA
##> 18     4  2002          NA
##> 19     4  2003          NA
##> 20     4  2004           1
  • Related