I tried to calculate the difference between dates in months for a huge database, using dbplyr package which sends my codes to the sql query. I tried different ways suggested in similar questions, but they didn't work (for example the codes below). Data2 is an object created by tbl()
Data2 %>% mutate(age_Months = (lubridate::interval(date_1, date_2)) / base::months(1))
#OR
library(lubridate)
Data2 %>% mutate(age_Months = TIMESTAMPDIFF(MONTH, date_1, date_2))
Do you have any ideas to calculate the differences? Feel free to suggest any other libraries and functions, it's not matter. I just need the differences in month. Thank you
The data set is as below.
Data2 %>% select(date_1, date_2) %>% head() %>% collect()
# date_1 date_2
# <chr> <chr>
#1 2015-05-01 00:00:00.0 2036-02-01 00:00:00.0
#2 2015-05-01 00:00:00.0 2036-01-01 00:00:00.0
#3 2015-05-01 00:00:00.0 2031-03-01 00:00:00.0
#4 2015-05-01 00:00:00.0 2035-12-01 00:00:00.0
#5 2015-05-01 00:00:00.0 2035-05-01 00:00:00.0
#6 2015-05-01 00:00:00.0 2032-03-01 00:00:00.0
I transformed them by as.Date()
but no difference happened in result.
Data2 %>% mutate(date_1 = as.Date(date_1),
date_2 = as.Date(date_2))
# date_1 date_2
# <date> <date>
#1 2018-08-01 2036-02-01
#2 2018-08-01 2036-06-01
#3 2018-08-01 2036-01-01
#4 2018-08-01 2034-08-01
#5 2018-08-01 2033-08-01
#6 2018-08-01 2035-03-01
CodePudding user response:
Here are two possible approaches.
1) Using lubridate
dbplyr translations are defined for some (but not all) lubridate functions into some (but not all) SQL flavors (when last I tested). So I would start by trying the following:
Data2 %>%
mutate(year1 = lubridate::year(date_1),
year2 = lubridate::year(date_2),
month1 = lubridate::month(date_1),
month2 = lubridate::month(date_2)) %>%
mutate(age_months = 12*(year2 - year1) month2 - month1) %>%
select(-year1, -year2, -month1, -month2)
2) Using untranslated SQL functions
Impala includes a DATEDIFF
function (documentation here). When dbplyr encounters a comand that it does not have a translation defined for, it passes the command untranslated through to the database.
So something like the following:
Data2 %>%
mutate(age_months = DATEDIFF(date_2, date_1) / 30)
should be translated into SQL like this:
SELECT *
,DATEDIFF(date_2, date_1) / 30 AS age_months
FROM data2