Hello Stack Community,
I am trying to merge two datasets based on the date column in each. Below are the snapshots of each dataset, in which, you can clearly see that date column only is formatted as year-month-day, with no specification for hour:minute:second.
dataset 1; snapshot of date column
dataset 2; snapshot of date column
I am merging the two datasets using the following code:
final <- merge(dataset1, dataset2, by.x="close_date", by.y = "date",all.x = TRUE, all.y = TRUE)
Below is another snapshot which shows the result of the code above. As you can see, somehow the hour:minute:second appears in date column which dirsputs the merge.
Is there a way to merge the two columns properly? Maybe someone could suggest the function that would set hh:mm:ss to 00:00:00? Btw, beforehand in the script I have both columns "close_date" from dataset1 and "date" from dataset2 formatted in the same way using the as.POSIXct
function.
You can check the R output for head() below:
head(dataset1) %>%
select(close_date)
A tibble: 6 x 1
close_date
<dttm>
1 2020-03-01 00:00:00
2 2020-03-02 00:00:00
3 2020-03-03 00:00:00
4 2020-03-04 00:00:00
5 2020-03-05 00:00:00
6 2020-03-06 00:00:00
head(dataset2) %>%
select(date)
A tibble: 6 x 1
date
<dttm>
1 2020-03-01 00:00:00
2 2020-03-02 00:00:00
3 2020-03-03 00:00:00
4 2020-03-04 00:00:00
5 2020-03-05 00:00:00
6 2020-03-06 00:00:00
CodePudding user response:
With the provided data (in future please provide data directly with dput(head(dataset1))
. See here How to make a great R reproducible example
We could define the date format in each dataset before merging using ymd
function of lubridate
package:
library(lubridate)
dataset1$close_date <- ymd(dataset1$close_date)
dataset2$date <- ymd(dataset2$date)
final <- merge(dataset1, dataset2, by.x="close_date", by.y = "date",all.x = TRUE, all.y = TRUE)
final
close_date coin case_count_world_
1 2020-03-01 BTCUSDT
2 2020-03-02 BTCUSDT
3 2020-03-03 BTCUSDT
4 2020-03-04 BTCUSDT
5 2020-03-05 BTCUSDT
6 2020-03-06 <NA>
7 2020-03-07 <NA>
8 2020-03-08 <NA>
data:
dataset1 <- structure(list(close_date = structure(c(18322, 18323, 18324,
18325, 18326), class = "Date"), coin = c("BTCUSDT", "BTCUSDT",
"BTCUSDT", "BTCUSDT", "BTCUSDT")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
dataset2 <- structure(list(date = structure(c(18322, 18323, 18324, 18325,
18326, 18327, 18328, 18329), class = "Date"), case_count_world_ = c("",
"", "", "", "", "", "", "")), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))