I have a dataset that looks like this :
vaR | date |
---|---|
A | 1/1/2022 |
A | 1/2/2022 |
A | 1/3/2022 |
B | 1/1/2022 |
B | 1/3/2022 |
C | 1/1/2022 |
C | 1/1/2022 |
C | 1/2/2022 |
C | 1/2/2022 |
C | 1/3/2022 |
And i want to be arranged by month and by the var variable. But if a month is not recorded (missing) i want to be added (to be appeared ) in the new column named Month and to mutate (dplyr phrase) another column that will check if there was an entry on that month (logical condition).But there are some entries for example C that has more that one entries which counts for one (distinct).
Ideally is must look like this :
var | Quarter | Month | Condition |
---|---|---|---|
A | 1 | 1 | TRUE |
A | 1 | 2 | TRUE |
A | 1 | 3 | TRUE |
B | 1 | 1 | TRUE |
B | 1 | 2 | FALSE |
B | 1 | 3 | TRUE |
C | 1 | 1 | TRUE |
C | 1 | 2 | TRUE |
C | 1 | 3 | TRUE |
As a start i have tried this one in R :
var = c(rep("A",3),rep("B",2),rep("C",5));var
date = c(as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/03/01"),
as.Date("2022/01/01"),as.Date("2022/03/01"),
as.Date("2022/01/01"),as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/02/01"),as.Date("2022/03/01"))
data = tibble(var,date)
quarter = 1
data%>%
dplyr::mutate(month = lubridate::month(date),
Quarter = quarter)
But i don't know how to add the missing month and check for the verified condition. Any help ?
CodePudding user response:
You can use complete()
to fill in the missing months and then check whether they have an associated date, then use distinct()
to find the unique combinations.
library(dplyr)
library(tidyr)
var = c(rep("A",3),rep("B",2),rep("C",5))
date = c(as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/03/01"),
as.Date("2022/01/01"),as.Date("2022/03/01"),
as.Date("2022/01/01"),as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/02/01"),as.Date("2022/03/01"))
data = tibble(var,date)
quarter = 1
data %>%
mutate(month = lubridate::month(date)) %>%
complete(var, month) %>%
mutate(Quarter = quarter,
Condition = !is.na(date)) %>%
distinct(var, month, Quarter, Condition)
#> # A tibble: 9 × 4
#> var month Quarter Condition
#> <chr> <dbl> <dbl> <lgl>
#> 1 A 1 1 TRUE
#> 2 A 2 1 TRUE
#> 3 A 3 1 TRUE
#> 4 B 1 1 TRUE
#> 5 B 2 1 FALSE
#> 6 B 3 1 TRUE
#> 7 C 1 1 TRUE
#> 8 C 2 1 TRUE
#> 9 C 3 1 TRUE
Created on 2022-06-01 by the reprex package (v2.0.1)
CodePudding user response:
You can approach it this way:
library(lubridate)
library(dplyr)
libraty(tidyr)
df <- df %>% mutate(month=month(date),quarter=quarter(month))
left_join(
expand(df, var,month,quarter),
select(df,var, month) %>% mutate(condition=TRUE) %>% distinct()
) %>% mutate(condition=!is.na(condition))
Output
var month quarter condition
<chr> <dbl> <int> <lgl>
1 A 1 1 TRUE
2 A 2 1 TRUE
3 A 3 1 TRUE
4 B 1 1 TRUE
5 B 2 1 FALSE
6 B 3 1 TRUE
7 C 1 1 TRUE
8 C 2 1 TRUE
9 C 3 1 TRUE