Home > Software engineering >  'Stretch' a grouped data frame using dplyr
'Stretch' a grouped data frame using dplyr

Time:09-05

I have a grouped dataframe with multiple IDs with a date and value column.

id <- c("a", "a", "a", "b", "b", "b", "c")
date <- c("2020-01-01", "2020-01-02", "2020-01-03",
          "2020-01-01", "2020-01-02", "2020-01-03",
          "2020-01-01")
value <- rnorm(n = length(id))
df <- cbind.data.frame(id, date, value)

However, some IDs have less than 3 dates. I want to "stretch" those IDs and add an NA for the value column for the new dates. In this dataframe, the "c" ID would have two new dates added ("2020-01-02" and "2020-01-03").

CodePudding user response:

Perhaps this approach would suit?

library(tidyverse)

id <- c("a", "a", "a", "b", "b", "b", "c")
date <- c("2020-01-01", "2020-01-02", "2020-01-03",
          "2020-01-01", "2020-01-02", "2020-01-03",
          "2020-01-01")
value <- rnorm(n = length(id))
df <- cbind.data.frame(id, date, value)

df %>%
  right_join(df %>% expand(id, date))
#> Joining, by = c("id", "date")
#>   id       date      value
#> 1  a 2020-01-01 -1.5371474
#> 2  a 2020-01-02  0.9001098
#> 3  a 2020-01-03  0.1523491
#> 4  b 2020-01-01  0.8194577
#> 5  b 2020-01-02  1.2005270
#> 6  b 2020-01-03  0.1158812
#> 7  c 2020-01-01 -0.8676445
#> 8  c 2020-01-02         NA
#> 9  c 2020-01-03         NA

Created on 2022-09-05 by the reprex package (v2.0.1)

CodePudding user response:

You could use complete() from tidyr.

library(tidyr)

df %>%
  complete(id, date)

# # A tibble: 9 × 3
#   id    date        value
#   <chr> <chr>       <dbl>
# 1 a     2020-01-01  1.12
# 2 a     2020-01-02  1.58
# 3 a     2020-01-03  1.26
# 4 b     2020-01-01 -2.30
# 5 b     2020-01-02 -1.45
# 6 b     2020-01-03 -0.212
# 7 c     2020-01-01  0.344
# 8 c     2020-01-02 NA
# 9 c     2020-01-03 NA

CodePudding user response:

In base R, by id you may merge with a data frame created out of sequences of the date range. First of all you want to use proper date format by doing df$date <- as.Date(df$date).

by(df, df$id, \(x) 
   merge(x, 
         data.frame(id=el(x$id), 
                    date=do.call(seq.Date, c(as.list(range(df$date)), 'day'))), 
         all=TRUE)) |>
  do.call(what=rbind)
#     id       date      value
# a.1  a 2020-01-01  1.3709584
# a.2  a 2020-01-02 -0.5646982
# a.3  a 2020-01-03  0.3631284
# b.1  b 2020-01-01  0.6328626
# b.2  b 2020-01-02  0.4042683
# b.3  b 2020-01-03 -0.1061245
# c.1  c 2020-01-01  1.5115220
# c.2  c 2020-01-02         NA
# c.3  c 2020-01-03         NA
  • Related