Home > Enterprise >  Merge two data with respect to date and week using R
Merge two data with respect to date and week using R

Time:07-26

I have two data. One has a year_month_week column and the other has a date column. I simplified and made two data for demonstration purpose.

df1<-data.frame(id=c(1,1,1,2,2,2,2),
               year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
               points=c(65,58,47,21,25,27,43))

df2<-data.frame(id=c(1,1,1,2,2,2),
                date=c(20220503,20220506,20220512,20220401,20220408,20220409),
                temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))

For df1, 2022051 means 1st week of May,2022. Likewise, 2022052 means 2nd week of May,2022. For df2,20220503 means May 3rd, 2022. What I want to do now is merge df1 and df2 with respect to year_month_week. In this case, 20220503 and 20220506 are 1st week of May,2022. If more than one date are in year_month_week, I will just include the first of them. So my expected output is as follows:

df<-data.frame(id=c(1,1,2,2),
               year_month_week=c(2022051,2022052,2022041,2022042),
               points=c(65,58,21,25),
               temperature=c(36.1,36.6,34.3,34.9))

CodePudding user response:

One way of doing it is to extract the last two digits of your date column in df2, divide the digits by 7, then round them up. This would be your week number (this part is in the mutate function).

Then just group_by the year_month_week column and only output one record per year_month_week, and join with df1.

library(tidyverse)
library(stringr)

df <- df2 %>% 
  mutate(year_month_week = 
           as.integer(
             paste0(str_extract(df2$date, ".*(?=\\d\\d$)"),
                    ceiling(as.integer(str_extract(df2$date, "\\d\\d$"))/7))
             )) %>% 
  group_by(year_month_week) %>% 
  slice_min(date) %>% 
  left_join(df1,
            by = c("year_month_week", "id")) %>% 
  select(-date)

df
# A tibble: 4 × 4
# Groups:   year_month_week [4]
     id temperature year_month_week points
  <dbl>       <dbl>           <dbl>  <dbl>
1     2        34.3         2022041     21
2     2        34.9         2022042     25
3     1        36.1         2022051     65
4     1        36.6         2022052     58
  • Related