Home > Back-end >  Merging datasets based on date column
Merging datasets based on date column

Time:02-10

Hello Stack Community,

I am trying to merge two datasets based on the date column in each. Below are the snapshots of each dataset, in which, you can clearly see that date column only is formatted as year-month-day, with no specification for hour:minute:second.

dataset 1; snapshot of date column

dataset 2; snapshot of date column

I am merging the two datasets using the following code:

final <- merge(dataset1, dataset2, by.x="close_date", by.y = "date",all.x = TRUE, all.y = TRUE)

Below is another snapshot which shows the result of the code above. As you can see, somehow the hour:minute:second appears in date column which dirsputs the merge.

result of the merge

Is there a way to merge the two columns properly? Maybe someone could suggest the function that would set hh:mm:ss to 00:00:00? Btw, beforehand in the script I have both columns "close_date" from dataset1 and "date" from dataset2 formatted in the same way using the as.POSIXct function.

You can check the R output for head() below:

 head(dataset1) %>%
   select(close_date)
 
 A tibble: 6 x 1
  close_date         
  <dttm>             
1 2020-03-01 00:00:00
2 2020-03-02 00:00:00
3 2020-03-03 00:00:00
4 2020-03-04 00:00:00
5 2020-03-05 00:00:00
6 2020-03-06 00:00:00


 head(dataset2) %>%
   select(date)

 A tibble: 6 x 1
  date               
  <dttm>             
 1 2020-03-01 00:00:00
 2 2020-03-02 00:00:00
 3 2020-03-03 00:00:00
 4 2020-03-04 00:00:00
 5 2020-03-05 00:00:00
 6 2020-03-06 00:00:00

CodePudding user response:

With the provided data (in future please provide data directly with dput(head(dataset1)). See here How to make a great R reproducible example

We could define the date format in each dataset before merging using ymd function of lubridate package:

library(lubridate)

dataset1$close_date <- ymd(dataset1$close_date)
dataset2$date <- ymd(dataset2$date)

final <- merge(dataset1, dataset2, by.x="close_date", by.y = "date",all.x = TRUE, all.y = TRUE)
final
  close_date    coin case_count_world_
1 2020-03-01 BTCUSDT                  
2 2020-03-02 BTCUSDT                  
3 2020-03-03 BTCUSDT                  
4 2020-03-04 BTCUSDT                  
5 2020-03-05 BTCUSDT                  
6 2020-03-06    <NA>                  
7 2020-03-07    <NA>                  
8 2020-03-08    <NA>  

data:

dataset1 <- structure(list(close_date = structure(c(18322, 18323, 18324, 
18325, 18326), class = "Date"), coin = c("BTCUSDT", "BTCUSDT", 
"BTCUSDT", "BTCUSDT", "BTCUSDT")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

dataset2 <- structure(list(date = structure(c(18322, 18323, 18324, 18325, 
18326, 18327, 18328, 18329), class = "Date"), case_count_world_ = c("", 
"", "", "", "", "", "", "")), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))
  • Related