I have two data. One has a year_month_week
column and the other has a date
column. I simplified and made two data for demonstration purpose.
df1<-data.frame(id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43))
df2<-data.frame(id=c(1,1,1,2,2,2),
date=c(20220503,20220506,20220512,20220401,20220408,20220409),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))
For df1
, 2022051
means 1st week of May,2022. Likewise, 2022052
means 2nd week of May,2022. For df2
,20220503
means May 3rd, 2022.
What I want to do now is merge df1
and df2
with respect to year_month_week
. In this case, 20220503
and 20220506
are 1st week of May,2022. If more than one date
are in year_month_week
, I will just include the first of them. So my expected output is as follows:
df<-data.frame(id=c(1,1,2,2),
year_month_week=c(2022051,2022052,2022041,2022042),
points=c(65,58,21,25),
temperature=c(36.1,36.6,34.3,34.9))
CodePudding user response:
One way of doing it is to extract the last two digits of your date
column in df2
, divide the digits by 7, then round them up. This would be your week
number (this part is in the mutate
function).
Then just group_by
the year_month_week
column and only output one record per year_month_week
, and join with df1
.
library(tidyverse)
library(stringr)
df <- df2 %>%
mutate(year_month_week =
as.integer(
paste0(str_extract(df2$date, ".*(?=\\d\\d$)"),
ceiling(as.integer(str_extract(df2$date, "\\d\\d$"))/7))
)) %>%
group_by(year_month_week) %>%
slice_min(date) %>%
left_join(df1,
by = c("year_month_week", "id")) %>%
select(-date)
df
# A tibble: 4 × 4
# Groups: year_month_week [4]
id temperature year_month_week points
<dbl> <dbl> <dbl> <dbl>
1 2 34.3 2022041 21
2 2 34.9 2022042 25
3 1 36.1 2022051 65
4 1 36.6 2022052 58