Home > front end >  R: How to convert a character to a numeric value without creating NAs or NaNs when dealing with time
R: How to convert a character to a numeric value without creating NAs or NaNs when dealing with time

Time:05-21

What I need to do:

I have a dataframe where I am trying to find the mean of ride_duration in 3 instances (mean, mean for "customer" and mean for "subscriber" in user_type.

When running head() this is what I see.

head(all_trips)
# A tibble: 6 × 8
   trip_id start_time      end_time     bike_id from_station_id to_station_id user_type ride_duration
     <dbl> <chr>           <chr>          <dbl>           <dbl>         <dbl> <chr>     <chr>        
1 22081306 3/18/2019 22:36 3/18/2019 2…    4836             233           233 Customer  0:01:01      
2 21965689 2/25/2019 19:44 2/25/2019 1…    5568             623           623 Customer  0:01:01      
3 21763499 1/5/2019 7:49   1/5/2019 7:…    5905              45            45 Customer  0:01:01      
4 21942654 2/20/2019 12:40 2/20/2019 1…    5940               3             3 Customer  0:01:01      
5 21839414 1/18/2019 16:22 1/18/2019 1…    3925             424           424 Customer  0:01:03      
6 22118075 3/23/2019 15:54 3/23/2019 1…    6025              34            34 Customer  0:01:04      

When trying to convert ride_duration to numeric, it changes the entire column to NAs. The same thing happens when I try to convert start_time or end_time to numeric so I can then use difftime() to come up with a new ride_duration column.

What I have tried:

all_trips$ride_duration <- as.numeric(as.character(all_trips$ride_duration))
Warning message:
NAs introduced by coercion 
> all_trips$ride_duration <- as.numeric(paste(all_trips$ride_duration))
Warning message:
NAs introduced by coercion 
transform(all_trips, ride_duration= as.numeric(ride_duration))

The one above just returned a preview of the table with NAs all down ride_duration.

Then I tried to remove the : symbols and then convert to numeric.

all_trips$ride_duration <-gsub(":","",as.character(all_trips$ride_duration))
> head(all_trips)
# A tibble: 6 × 8
   trip_id start_time      end_time     bike_id from_station_id to_station_id user_type ride_duration
     <dbl> <chr>           <chr>          <dbl>           <dbl>         <dbl> <chr>     <chr>        
1 22081306 3/18/2019 22:36 3/18/2019 2…    4836             233           233 Customer  00101        
2 21965689 2/25/2019 19:44 2/25/2019 1…    5568             623           623 Customer  00101        
3 21763499 1/5/2019 7:49   1/5/2019 7:…    5905              45            45 Customer  00101        
4 21942654 2/20/2019 12:40 2/20/2019 1…    5940               3             3 Customer  00101        
5 21839414 1/18/2019 16:22 1/18/2019 1…    3925             424           424 Customer  00103        
6 22118075 3/23/2019 15:54 3/23/2019 1…    6025              34            34 Customer  00104        
> all_trips$ride_duration <- as.numeric(as.character(all_trips$ride_duration))
Warning message:
NAs introduced by coercion 
> head(all_trips)
# A tibble: 6 × 8
   trip_id start_time      end_time     bike_id from_station_id to_station_id user_type ride_duration
     <dbl> <chr>           <chr>          <dbl>           <dbl>         <dbl> <chr>             <dbl>
1 22081306 3/18/2019 22:36 3/18/2019 2…    4836             233           233 Customer            101
2 21965689 2/25/2019 19:44 2/25/2019 1…    5568             623           623 Customer            101
3 21763499 1/5/2019 7:49   1/5/2019 7:…    5905              45            45 Customer            101
4 21942654 2/20/2019 12:40 2/20/2019 1…    5940               3             3 Customer            101
5 21839414 1/18/2019 16:22 1/18/2019 1…    3925             424           424 Customer            103
6 22118075 3/23/2019 15:54 3/23/2019 1…    6025              34            34 Customer            104
> mean(all_trips$ride_duration)
[1] NA
> mean(all_trips$ride_duration, na.rm = TRUE)
[1] 2989.162
> 

I still got an error message showing me some items in there are NAs, but I am concerned because after I converted to numeric it removed some zeroes from my HH:MM:SS time format so I am worried it may be calculating incorrectly.

I have also tried converting my original 4 dataframes q1-4 into dates in a new column with:

 q1$date <- as.Date(q1$start_time, "%m/%d/%Y %H:%M:%S")

but it returns values of NAs

Repeating Question How can I convert ride_duration to numeric without creating NAs?

Also is there a way to convert back to what I had before running the as.numeric without rerunning all of my code loading and binding data frames?

CodePudding user response:

use as.ITime from data.table:

as.numeric(data.table::as.ITime(times))
[1] 61 61 61 61 63 64

 times <- c('0:01:01', '0:01:01', '0:01:01', '0:01:01', '0:01:03', '0:01:04')

CodePudding user response:

Convert start and end times to use difftime()

To use difftime() on the end and start times, try converting them to POSIXct via something like

 all_trips$start_time_posix <- as.POSIXct(all_tripsstart_time, format = "%m/%d/%Y %H:%M")

as also seen on other discussions, i.e. here.

Convert ride_duration directly

To convert the ride_duration to a numeric, we first have to know the units and then do something like this:

duration_secs <- sum(as.numeric(strsplit(ride_duration, ":")[[1]]) * c(360, 60, 1))

This just works on a single value, but the pattern is - assuming ride_duration is in HOURS:MINUTES:SECONDS and the desired unit is seconds - to first split the string on the ":" and then multiply each of the resulting numbers to get three values in the desired unit and then sum them up.

Because I am too used to data.table, I would do this for the whole column as

library(data.table)

all_trips <- as.data.table(all_trips)

then add the columns of the individual units

all_trips[, c("duration_h", "duration_min", "duration_sec") := 
  tstrsplit(ab, ":", fixed=TRUE, type.convert = as.numeric)]

and then sum them up in a new column

tempdt[, duration_total := 
  duration_h * 360   duration_min * 60   duration_sec ]

Afterwards, you can delete the individual columns individually with

all_trips[, duration_h := NULL]

or all at once like

colsToDelete <-  c("duration_h", "duration_min", "duration_sec")
all_trips[,  (colsToDelete) := NULL]

There are likely more elegant ways without inserting the rows for each time unit, but maybe this is a sufficient and intelligible example to get you running.

  •  Tags:  
  • r
  • Related