I've got a table that looks something like the following
Date | Key | Metric |
---|---|---|
2021-01-01 | A | 6 |
2021-02-01 | A | 3 |
2021-05-01 | A | 3 |
2021-03-01 | B | 4 |
2021-04-01 | B | 1 |
2021-05-01 | B | 2 |
What I'd like to do is insert a row with a metric of 0 for Key A for the date of 2021-03-01, since Key A had already appeared in January in February.
Key B, on the other hand, would ideally stay untouched since it has metrics associated with every date after its appearance. (The table I'm working with happens to be monthly, but I'm sure I could make the changes to make a daily solution work here)
So, Ideally we'd end up with a table looking like the following
Date | Key | Metric |
---|---|---|
2021-01-01 | A | 6 |
2021-02-01 | A | 3 |
2021-03-01 | A | 0 |
2021-04-01 | A | 0 |
2021-05-01 | A | 3 |
2021-03-01 | B | 4 |
2021-04-01 | B | 1 |
2021-05-01 | B | 2 |
I'm thinking this may be better suited for SQL, but would like to try and R -- should I be going down the path of some loop to check every key's first date, check if it has an entry for the following date, and adding if not? It feels like that would get fairly unwieldy fairly quickly.
That's all for now, thank you very much everyone
CodePudding user response:
We could use complete
after grouping:
library(lubridate) # formatting date
library(dplyr)
df %>%
mutate(Date = as.Date(ydm(Date))) %>% # you don't need this if your date is in correct format
group_by(Key) %>%
complete(Date = seq(min(Date), max(Date), by = "1 day"),
fill = list(Metric = 0))
Key Date Metric
<chr> <date> <dbl>
1 A 2021-01-01 6
2 A 2021-01-02 3
3 A 2021-01-03 0
4 A 2021-01-04 0
5 A 2021-01-05 3
6 B 2021-01-03 4
7 B 2021-01-04 1
8 B 2021-01-05 2
CodePudding user response:
I bet there's a shorter way, but this should work:
library(tidyverse)
df %>%
# get min and max date within Key
group_by(Key) %>%
mutate(minDate = min(Date), maxDate = max(Date)) %>%
ungroup() %>%
# make row for every Key / Date combo
complete(Key, Date, fill = list(Metric = 0)) %>%
# only keep the ones in that Key's range
filter(Date >= minDate, Date <= maxDate)