Home > Mobile >  Creating 0s for dates currently without data, after a key's first appearance
Creating 0s for dates currently without data, after a key's first appearance

Time:10-21

I've got a table that looks something like the following

Date Key Metric
2021-01-01 A 6
2021-02-01 A 3
2021-05-01 A 3
2021-03-01 B 4
2021-04-01 B 1
2021-05-01 B 2

What I'd like to do is insert a row with a metric of 0 for Key A for the date of 2021-03-01, since Key A had already appeared in January in February.

Key B, on the other hand, would ideally stay untouched since it has metrics associated with every date after its appearance. (The table I'm working with happens to be monthly, but I'm sure I could make the changes to make a daily solution work here)

So, Ideally we'd end up with a table looking like the following

Date Key Metric
2021-01-01 A 6
2021-02-01 A 3
2021-03-01 A 0
2021-04-01 A 0
2021-05-01 A 3
2021-03-01 B 4
2021-04-01 B 1
2021-05-01 B 2

I'm thinking this may be better suited for SQL, but would like to try and R -- should I be going down the path of some loop to check every key's first date, check if it has an entry for the following date, and adding if not? It feels like that would get fairly unwieldy fairly quickly.

That's all for now, thank you very much everyone

CodePudding user response:

We could use complete after grouping:

library(lubridate) # formatting date
library(dplyr)
df %>% 
  mutate(Date = as.Date(ydm(Date))) %>%  # you don't need this if your date is in correct format
  group_by(Key) %>% 
  complete(Date = seq(min(Date), max(Date), by = "1 day"),
           fill = list(Metric = 0))
  Key   Date       Metric
  <chr> <date>      <dbl>
1 A     2021-01-01      6
2 A     2021-01-02      3
3 A     2021-01-03      0
4 A     2021-01-04      0
5 A     2021-01-05      3
6 B     2021-01-03      4
7 B     2021-01-04      1
8 B     2021-01-05      2

CodePudding user response:

I bet there's a shorter way, but this should work:

library(tidyverse)
df %>%

  # get min and max date within Key
  group_by(Key) %>%
  mutate(minDate = min(Date), maxDate = max(Date)) %>%
  ungroup() %>%

  # make row for every Key / Date combo
  complete(Key, Date, fill = list(Metric = 0)) %>%

  # only keep the ones in that Key's range
  filter(Date >= minDate, Date <= maxDate)
  • Related