Home > OS >  How can i add missing month value and remove duplicate in dplyr in R?
How can i add missing month value and remove duplicate in dplyr in R?

Time:06-02

I have a dataset that looks like this :

vaR date
A 1/1/2022
A 1/2/2022
A 1/3/2022
B 1/1/2022
B 1/3/2022
C 1/1/2022
C 1/1/2022
C 1/2/2022
C 1/2/2022
C 1/3/2022

And i want to be arranged by month and by the var variable. But if a month is not recorded (missing) i want to be added (to be appeared ) in the new column named Month and to mutate (dplyr phrase) another column that will check if there was an entry on that month (logical condition).But there are some entries for example C that has more that one entries which counts for one (distinct).

Ideally is must look like this :

var Quarter Month Condition
A 1 1 TRUE
A 1 2 TRUE
A 1 3 TRUE
B 1 1 TRUE
B 1 2 FALSE
B 1 3 TRUE
C 1 1 TRUE
C 1 2 TRUE
C 1 3 TRUE

As a start i have tried this one in R :

var = c(rep("A",3),rep("B",2),rep("C",5));var
date = c(as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/02/01"),as.Date("2022/03/01"))
data = tibble(var,date)
quarter = 1
data%>%
  dplyr::mutate(month = lubridate::month(date),
                Quarter = quarter)

But i don't know how to add the missing month and check for the verified condition. Any help ?

CodePudding user response:

You can use complete() to fill in the missing months and then check whether they have an associated date, then use distinct() to find the unique combinations.

library(dplyr)
library(tidyr)

var = c(rep("A",3),rep("B",2),rep("C",5))
date = c(as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/02/01"),as.Date("2022/03/01"))
data = tibble(var,date)
quarter = 1
data %>% 
  mutate(month = lubridate::month(date)) %>% 
  complete(var, month) %>% 
  mutate(Quarter = quarter,
         Condition = !is.na(date)) %>% 
  distinct(var, month, Quarter, Condition)
#> # A tibble: 9 × 4
#>   var   month Quarter Condition
#>   <chr> <dbl>   <dbl> <lgl>    
#> 1 A         1       1 TRUE     
#> 2 A         2       1 TRUE     
#> 3 A         3       1 TRUE     
#> 4 B         1       1 TRUE     
#> 5 B         2       1 FALSE    
#> 6 B         3       1 TRUE     
#> 7 C         1       1 TRUE     
#> 8 C         2       1 TRUE     
#> 9 C         3       1 TRUE

Created on 2022-06-01 by the reprex package (v2.0.1)

CodePudding user response:

You can approach it this way:

library(lubridate)
library(dplyr)
libraty(tidyr)

df <- df %>% mutate(month=month(date),quarter=quarter(month))
left_join(
  expand(df, var,month,quarter),
  select(df,var, month) %>% mutate(condition=TRUE) %>%  distinct()
) %>% mutate(condition=!is.na(condition))

Output

  var   month quarter condition
  <chr> <dbl>   <int> <lgl>    
1 A         1       1 TRUE     
2 A         2       1 TRUE     
3 A         3       1 TRUE     
4 B         1       1 TRUE     
5 B         2       1 FALSE    
6 B         3       1 TRUE     
7 C         1       1 TRUE     
8 C         2       1 TRUE     
9 C         3       1 TRUE  
  • Related