Home > Blockchain >  Aggregate week and date in R by some specific rules
Aggregate week and date in R by some specific rules

Time:07-27

I'm not used to using R. I already asked a question on stack overflow and got a great answer. I'm sorry to post a similar question, but I tried many times and got the output that I didn't expect. This time, I want to do slightly different from my previous question. Merge two data with respect to date and week using R I have two data. One has a year_month_week column and the other has a date column.

df1<-data.frame(id=c(1,1,1,2,2,2,2),
               year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
               points=c(65,58,47,21,25,27,43))

df2<-data.frame(id=c(1,1,1,2,2,2),
                date=c(20220503,20220506,20220512,20220401,20220408,20220409),
                temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))

For df1, 2022051 means 1st week of May,2022. Likewise, 2022052 means 2nd week of May,2022. For df2,20220503 means May 3rd, 2022. What I want to do now is merge df1 and df2 with respect to year_month_week. In this case, 20220503 and 20220506 are 1st week of May,2022.If more than one date are in year_month_week, I will just include the first of them. Now, here's the different part. Even if there is no date inside year_month_week,just leave it NA. So my expected output has a same number of rows as df1 which includes the column year_month_week.So my expected output is as follows:

df<-data.frame(id=c(1,1,1,2,2,2,2),
               year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
               points=c(65,58,47,21,25,27,43),
               temperature=c(36.1,36.6,NA,34.3,34.9,NA,NA))

CodePudding user response:

First we can convert the dates in df2 into year-month-date format, then join the two tables:

library(dplyr);library(lubridate)
df2$dt = ymd(df2$date)
df2$wk = day(df2$dt) %/% 7   1
df2$year_month_week = as.numeric(paste0(format(df2$dt, "%Y%m"), df2$wk))

df1 %>%
  left_join(df2 %>% group_by(year_month_week) %>% slice(1) %>%
              select(year_month_week, temperature))

Result

Joining, by = "year_month_week"
  id year_month_week points temperature
1  1         2022051     65        36.1
2  1         2022052     58        36.6
3  1         2022053     47          NA
4  2         2022041     21        34.3
5  2         2022042     25        34.9
6  2         2022043     27          NA
7  2         2022044     43          NA

CodePudding user response:

You can build off of a previous answer here by taking the function to count the week of the month, then generate a join key in df2. See here

df1 <- data.frame(
  id=c(1,1,1,2,2,2,2),
  year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
  points=c(65,58,47,21,25,27,43))

df2 <- data.frame(
  id=c(1,1,1,2,2,2),
  date=c(20220503,20220506,20220512,20220401,20220408,20220409),
  temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))

# Take the function from the previous StackOverflow question
monthweeks.Date <- function(x) {
  ceiling(as.numeric(format(x, "%d")) / 7)
}

# Create a year_month_week variable to join on 
df2 <- 
  df2 %>%
  mutate(
    date = lubridate::parse_date_time(
      x = date, 
      orders = "%Y%m%d"),
    year_month_week = paste0(
      lubridate::year(date), 
      0,
      lubridate::month(date),
      monthweeks.Date(date)),
    year_month_week = as.double(year_month_week)) 

# Remove duplicate year_month_weeks
df2 <- 
  df2 %>%
  arrange(year_month_week) %>% 
  distinct(year_month_week, .keep_all = T)

# Join dataframes
df1 <- 
  left_join(
    df1, 
    df2, 
    by = "year_month_week")

Produces this result

id.x year_month_week points id.y       date temperature
1    1         2022051     65    1 2022-05-03        36.1
2    1         2022052     58    1 2022-05-12        36.6
3    1         2022053     47   NA       <NA>          NA
4    2         2022041     21    2 2022-04-01        34.3
5    2         2022042     25    2 2022-04-08        34.9
6    2         2022043     27   NA       <NA>          NA
7    2         2022044     43   NA       <NA>          NA
> 

Edit: forgot to mention that you need tidyverse loaded

library(tidyverse)
  • Related