Home > Blockchain >  recode a time variable (format: hh:mm:ss) into a categorical variable
recode a time variable (format: hh:mm:ss) into a categorical variable

Time:11-15

I have a variable named duration.video in the following format hh:mm:ss that I would like to recode into a categorical variable ('Less than 5 minutes', 'between 5 and 30 min', etc.)

Here is my line of code:

video$Duration.video<-as.factor(car::recode(
  video$Duration.video, 
  "00:00:01:00:04:59='Less than 5 minutes';00:05:00:00:30:00='Between 5 and 30 minutes';00:30:01:01:59:59='More than 30 minutes and less than 2h';02:00:00:08:00:00='2h and more'"
))

The code does not work because all the values of the variable are put in one category ('Between 5 and 30 minutes').

I think it's because my variable is in character format but I can't convert it to numeric. And also maybe the format with ":" can be a problem for the recoding in R.

I tried to convert to data.table::ITime but the result remains the same.

CodePudding user response:

This is a tidy solution. You can get this done with base R but this may be easier.

library(lubridate)
library(dplyr)

df <- data.frame(
  duration_string = c("00:00:03","00:00:06","00:12:00","00:31:00","01:12:01")
  )

df <- df %>%
  mutate(
    duration = as.duration(hms(duration_string)),
    cat_duration = case_when(
      duration < dseconds(5) ~ "less than 5 secs",
      duration >= dseconds(5) & duration < dminutes(30) ~ "between 5 secs and 30 mins",
      duration >= dminutes(30) & duration < dhours(1) ~ "between 30 mins and 1 hour",
      duration > dhours(1) ~ "more than 1 hour",
    ) ,
      cat_duration = factor(cat_duration,levels = c("less than 5 secs",
                                                    "between 5 secs and 30 mins",
                                                    "between 30 mins and 1 hour",
                                                    "more than 1 hour"
                                                    ))
  ) 

CodePudding user response:

We can use factor. This only uses base R:

labs <- c('Less than 5 minutes', 
      'Between 5 and 30 minutes', 
      'More than 30 minutes and less than 2h', 
      '2h and more')
transform(df, factor = { 
  hms <- substr(duration_string, 1, 8)
  factor((hms >= "00:00:05")   (hms > "00:30:00")   (hms >= "02:00:00"), 0:3, labs)
  })
  • Related