I have a data set and there are some date and hour attributes. Here is the sample, then I will explain what I want to do;
date1 | hour1 | date2 | hour2 | date3 | hour3 |
---|---|---|---|---|---|
2014-03-16 00:00:00 | 16:20:00 | 2014-03-16 00:00:00 | 20:20:03 | 2014-03-16 00:00:00 | 22:12:34 |
2014-04-22 00:00:00 | 10:20:00 | 2014-04-22 00:00:00 | 15:20:03 | 2014-04-22 00:00:00 | 20:12:34 |
2015-03-12 00:00:00 | 16:20:00 | 2015-03-12 00:00:00 | 20:20:03 | 2015-03-12 00:00:00 | 22:12:34 |
We know event1 happens before event2 (event1 -> event2 -> event3)
But as you see, in the date attributes, time section is not correct yet we have hour attributes for each. What I want to do; I want to correct them by using hour attributes then find the difference between these two dates and create new attributes that gives the time difference as hours.
Sample for above table;
event2_time |
---|
4 |
5 |
4 |
I tried to merge hour to date and create a new attribute like this but it doesn't work.(my goal is actually correct the date value and get rid of the hour attribute)
trainTable <- trainTable %>%
mutate("newParam" = as.POSIXct(paste(alert_date, alert_hour), format="%Y-%m-%d %H:%M:%S")
I could use some help, thanks in advance.
CodePudding user response:
Try this solution using mapply
. It uses strsplit
to split the date from the hours.
dat_new <- data.frame( setNames( mapply( function(x,y){
tmp <- sapply( strsplit( dat[,x], " "), function(z) z[1] );
list( as.POSIXct( paste(tmp,dat[,y] ) ) ) },
grep("date", colnames(dat)), grep("hour", colnames(dat)) ),
c("a","b","c") ) )
dat_new$b - dat_new$a
Time differences in hours
[1] 4.000833 5.000833 4.000833
Data
dat <- structure(list(date1 = c("2014-03-16 00:00:00", "2014-04-22 00:00:00",
"2015-03-12 00:00:00"), hour1 = c("16:20:00", "10:20:00", "16:20:00"
), date2 = c("2014-03-16 00:00:00", "2014-04-22 00:00:00", "2015-03-12 00:00:00"
), hour2 = c("20:20:03", "15:20:03", "20:20:03"), date3 = c("2014-03-16 00:00:00",
"2014-04-22 00:00:00", "2015-03-12 00:00:00"), hour3 = c("22:12:34",
"20:12:34", "22:12:34")), class = "data.frame", row.names = c(NA,
-3L))