I have the following dataframe called df (dput
below):
group date value
1 A 2022-12-01 1
2 A 2022-12-03 3
3 A 2022-12-06 NA
4 A 2022-12-08 1
5 B 2022-12-01 2
6 B 2022-12-05 2
7 C 2022-12-01 4
8 C 2022-12-06 NA
9 C 2022-12-08 6
I would like to complete
the dates and values per group. This would result in the following desired output:
group date value
1 A 2022-12-01 1
2 A 2022-12-02 1
3 A 2022-12-03 3
4 A 2022-12-04 3
5 A 2022-12-05 3
6 A 2022-12-06 NA
7 A 2022-12-07 NA
8 A 2022-12-08 1
9 B 2022-12-01 2
10 B 2022-12-02 2
11 B 2022-12-03 2
12 B 2022-12-04 2
13 B 2022-12-05 2
14 C 2022-12-01 4
15 C 2022-12-02 4
16 C 2022-12-03 4
17 C 2022-12-04 4
18 C 2022-12-05 4
19 C 2022-12-06 NA
20 C 2022-12-07 NA
21 C 2022-12-08 6
The problem is that I would like to fill the values but not all NA because some should stay NA because their started row of df has an NA like row 3 and 8 in df. As you can see in the desired output the next rows still have NA like row 7 and 20 from desired output. When I run the following code, it fixes the dates but not the values because they are all NA:
library(dplyr)
library(tidyr)
df %>%
group_by(group) %>%
complete(., date = seq(min(date), max(date), by = 'day', fill = list(value = NA)))
#> # A tibble: 21 × 3
#> # Groups: group [3]
#> group date value
#> <chr> <dttm> <dbl>
#> 1 A 2022-12-01 00:00:00 1
#> 2 A 2022-12-02 00:00:00 NA
#> 3 A 2022-12-03 00:00:00 3
#> 4 A 2022-12-04 00:00:00 NA
#> 5 A 2022-12-05 00:00:00 NA
#> 6 A 2022-12-06 00:00:00 NA
#> 7 A 2022-12-07 00:00:00 NA
#> 8 A 2022-12-08 00:00:00 1
#> 9 B 2022-12-01 00:00:00 2
#> 10 B 2022-12-02 00:00:00 NA
#> # … with 11 more rows
When I add the fill
function it of course fills all the values:
library(dplyr)
library(tidyr)
df %>%
group_by(group) %>%
complete(., date = seq(min(date), max(date), by = 'day', fill = list(value = NA))) %>%
fill(value, .direction = 'down')
#> # A tibble: 21 × 3
#> # Groups: group [3]
#> group date value
#> <chr> <dttm> <dbl>
#> 1 A 2022-12-01 00:00:00 1
#> 2 A 2022-12-02 00:00:00 1
#> 3 A 2022-12-03 00:00:00 3
#> 4 A 2022-12-04 00:00:00 3
#> 5 A 2022-12-05 00:00:00 3
#> 6 A 2022-12-06 00:00:00 3
#> 7 A 2022-12-07 00:00:00 3
#> 8 A 2022-12-08 00:00:00 1
#> 9 B 2022-12-01 00:00:00 2
#> 10 B 2022-12-02 00:00:00 2
#> # … with 11 more rows
Created on 2022-12-15 with reprex v2.0.2
So I was wondering if anyone knows how to complete by the dates and values and take care of the NA's?
dput
df:
df <- structure(list(group = c("A", "A", "A", "A", "B", "B", "C", "C",
"C"), date = structure(c(1669849200, 1670022000, 1670281200,
1670454000, 1669849200, 1670194800, 1669849200, 1670281200, 1670454000
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(1,
3, NA, 1, 2, 2, 4, NA, 6)), class = "data.frame", row.names = c(NA,
-9L))
CodePudding user response:
What about changing NA values in the first place?
df %>%
replace_na(list(value = 0)) %>%
complete(group, date = seq(min(date), max(date), by = 'day')) %>%
fill(value) %>%
mutate(value = na_if(value, 0))
output
# A tibble: 24 × 3
group date value
<chr> <dttm> <dbl>
1 A 2022-12-01 00:00:00 1
2 A 2022-12-02 00:00:00 1
3 A 2022-12-03 00:00:00 3
4 A 2022-12-04 00:00:00 3
5 A 2022-12-05 00:00:00 3
6 A 2022-12-06 00:00:00 NA
7 A 2022-12-07 00:00:00 NA
8 A 2022-12-08 00:00:00 1
9 B 2022-12-01 00:00:00 2
10 B 2022-12-02 00:00:00 2
11 B 2022-12-03 00:00:00 2
12 B 2022-12-04 00:00:00 2
13 B 2022-12-05 00:00:00 2
14 B 2022-12-06 00:00:00 2
15 B 2022-12-07 00:00:00 2
16 B 2022-12-08 00:00:00 2
17 C 2022-12-01 00:00:00 4
18 C 2022-12-02 00:00:00 4
19 C 2022-12-03 00:00:00 4
20 C 2022-12-04 00:00:00 4
21 C 2022-12-05 00:00:00 4
22 C 2022-12-06 00:00:00 NA
23 C 2022-12-07 00:00:00 NA
24 C 2022-12-08 00:00:00 6
CodePudding user response:
Using data.table
:
library(data.table)
setDT(df)
df2 <-
df[, .(date = seq(from = first(date), to = last(date), "day")), by = group
][, value := df[.SD, on = .(group, date), value, roll = TRUE]]
df2
# group date value
# <char> <POSc> <num>
# 1: A 2022-11-30 23:00:00 1
# 2: A 2022-12-01 23:00:00 1
# 3: A 2022-12-02 23:00:00 3
# 4: A 2022-12-03 23:00:00 3
# 5: A 2022-12-04 23:00:00 3
# 6: A 2022-12-05 23:00:00 NA
# 7: A 2022-12-06 23:00:00 NA
# 8: A 2022-12-07 23:00:00 1
# 9: B 2022-11-30 23:00:00 2
# 10: B 2022-12-01 23:00:00 2
# 11: B 2022-12-02 23:00:00 2
# 12: B 2022-12-03 23:00:00 2
# 13: B 2022-12-04 23:00:00 2
# 14: C 2022-11-30 23:00:00 4
# 15: C 2022-12-01 23:00:00 4
# 16: C 2022-12-02 23:00:00 4
# 17: C 2022-12-03 23:00:00 4
# 18: C 2022-12-04 23:00:00 4
# 19: C 2022-12-05 23:00:00 NA
# 20: C 2022-12-06 23:00:00 NA
# 21: C 2022-12-07 23:00:00 6
# group date value