Home > other >  Complete dates and values with NA per group in R
Complete dates and values with NA per group in R

Time:12-16

I have the following dataframe called df (dput below):

  group       date value
1     A 2022-12-01     1
2     A 2022-12-03     3
3     A 2022-12-06    NA
4     A 2022-12-08     1
5     B 2022-12-01     2
6     B 2022-12-05     2
7     C 2022-12-01     4
8     C 2022-12-06    NA
9     C 2022-12-08     6

I would like to complete the dates and values per group. This would result in the following desired output:

   group       date value
 1 A     2022-12-01     1
 2 A     2022-12-02     1
 3 A     2022-12-03     3
 4 A     2022-12-04     3
 5 A     2022-12-05     3
 6 A     2022-12-06    NA
 7 A     2022-12-07    NA
 8 A     2022-12-08     1
 9 B     2022-12-01     2
10 B     2022-12-02     2
11 B     2022-12-03     2
12 B     2022-12-04     2
13 B     2022-12-05     2
14 C     2022-12-01     4
15 C     2022-12-02     4
16 C     2022-12-03     4
17 C     2022-12-04     4
18 C     2022-12-05     4
19 C     2022-12-06    NA
20 C     2022-12-07    NA
21 C     2022-12-08     6

The problem is that I would like to fill the values but not all NA because some should stay NA because their started row of df has an NA like row 3 and 8 in df. As you can see in the desired output the next rows still have NA like row 7 and 20 from desired output. When I run the following code, it fixes the dates but not the values because they are all NA:

library(dplyr)
library(tidyr)
df %>%
  group_by(group) %>%
  complete(., date = seq(min(date), max(date), by = 'day', fill = list(value = NA)))
#> # A tibble: 21 × 3
#> # Groups:   group [3]
#>    group date                value
#>    <chr> <dttm>              <dbl>
#>  1 A     2022-12-01 00:00:00     1
#>  2 A     2022-12-02 00:00:00    NA
#>  3 A     2022-12-03 00:00:00     3
#>  4 A     2022-12-04 00:00:00    NA
#>  5 A     2022-12-05 00:00:00    NA
#>  6 A     2022-12-06 00:00:00    NA
#>  7 A     2022-12-07 00:00:00    NA
#>  8 A     2022-12-08 00:00:00     1
#>  9 B     2022-12-01 00:00:00     2
#> 10 B     2022-12-02 00:00:00    NA
#> # … with 11 more rows

When I add the fill function it of course fills all the values:

library(dplyr)
library(tidyr)
df %>%
  group_by(group) %>%
  complete(., date = seq(min(date), max(date), by = 'day', fill = list(value = NA))) %>%
  fill(value, .direction = 'down')
#> # A tibble: 21 × 3
#> # Groups:   group [3]
#>    group date                value
#>    <chr> <dttm>              <dbl>
#>  1 A     2022-12-01 00:00:00     1
#>  2 A     2022-12-02 00:00:00     1
#>  3 A     2022-12-03 00:00:00     3
#>  4 A     2022-12-04 00:00:00     3
#>  5 A     2022-12-05 00:00:00     3
#>  6 A     2022-12-06 00:00:00     3
#>  7 A     2022-12-07 00:00:00     3
#>  8 A     2022-12-08 00:00:00     1
#>  9 B     2022-12-01 00:00:00     2
#> 10 B     2022-12-02 00:00:00     2
#> # … with 11 more rows

Created on 2022-12-15 with reprex v2.0.2

So I was wondering if anyone knows how to complete by the dates and values and take care of the NA's?


dput df:

df <- structure(list(group = c("A", "A", "A", "A", "B", "B", "C", "C", 
"C"), date = structure(c(1669849200, 1670022000, 1670281200, 
1670454000, 1669849200, 1670194800, 1669849200, 1670281200, 1670454000
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(1, 
3, NA, 1, 2, 2, 4, NA, 6)), class = "data.frame", row.names = c(NA, 
-9L))

CodePudding user response:

What about changing NA values in the first place?

df %>%
  replace_na(list(value = 0)) %>% 
  complete(group, date = seq(min(date), max(date), by = 'day')) %>% 
  fill(value) %>% 
  mutate(value = na_if(value, 0))

output

# A tibble: 24 × 3
   group date                value
   <chr> <dttm>              <dbl>
 1 A     2022-12-01 00:00:00     1
 2 A     2022-12-02 00:00:00     1
 3 A     2022-12-03 00:00:00     3
 4 A     2022-12-04 00:00:00     3
 5 A     2022-12-05 00:00:00     3
 6 A     2022-12-06 00:00:00    NA
 7 A     2022-12-07 00:00:00    NA
 8 A     2022-12-08 00:00:00     1
 9 B     2022-12-01 00:00:00     2
10 B     2022-12-02 00:00:00     2
11 B     2022-12-03 00:00:00     2
12 B     2022-12-04 00:00:00     2
13 B     2022-12-05 00:00:00     2
14 B     2022-12-06 00:00:00     2
15 B     2022-12-07 00:00:00     2
16 B     2022-12-08 00:00:00     2
17 C     2022-12-01 00:00:00     4
18 C     2022-12-02 00:00:00     4
19 C     2022-12-03 00:00:00     4
20 C     2022-12-04 00:00:00     4
21 C     2022-12-05 00:00:00     4
22 C     2022-12-06 00:00:00    NA
23 C     2022-12-07 00:00:00    NA
24 C     2022-12-08 00:00:00     6

CodePudding user response:

Using data.table:

library(data.table)
setDT(df)
df2 <- 
  df[, .(date = seq(from = first(date), to = last(date), "day")), by = group
     ][, value := df[.SD, on = .(group, date), value, roll = TRUE]]

df2
#      group                date value
#     <char>              <POSc> <num>
#  1:      A 2022-11-30 23:00:00     1
#  2:      A 2022-12-01 23:00:00     1
#  3:      A 2022-12-02 23:00:00     3
#  4:      A 2022-12-03 23:00:00     3
#  5:      A 2022-12-04 23:00:00     3
#  6:      A 2022-12-05 23:00:00    NA
#  7:      A 2022-12-06 23:00:00    NA
#  8:      A 2022-12-07 23:00:00     1
#  9:      B 2022-11-30 23:00:00     2
# 10:      B 2022-12-01 23:00:00     2
# 11:      B 2022-12-02 23:00:00     2
# 12:      B 2022-12-03 23:00:00     2
# 13:      B 2022-12-04 23:00:00     2
# 14:      C 2022-11-30 23:00:00     4
# 15:      C 2022-12-01 23:00:00     4
# 16:      C 2022-12-02 23:00:00     4
# 17:      C 2022-12-03 23:00:00     4
# 18:      C 2022-12-04 23:00:00     4
# 19:      C 2022-12-05 23:00:00    NA
# 20:      C 2022-12-06 23:00:00    NA
# 21:      C 2022-12-07 23:00:00     6
#      group                date value
  • Related