I have a variable named duration.video
in the following format hh:mm:ss
that I would like to recode into a categorical variable ('Less than 5 minutes', 'between 5 and 30 min', etc.)
Here is my line of code:
video$Duration.video<-as.factor(car::recode(
video$Duration.video,
"00:00:01:00:04:59='Less than 5 minutes';00:05:00:00:30:00='Between 5 and 30 minutes';00:30:01:01:59:59='More than 30 minutes and less than 2h';02:00:00:08:00:00='2h and more'"
))
The code does not work because all the values of the variable are put in one category ('Between 5 and 30 minutes').
I think it's because my variable is in character format but I can't convert it to numeric. And also maybe the format with ":" can be a problem for the recoding in R.
I tried to convert to data.table::ITime
but the result remains the same.
CodePudding user response:
This is a tidy solution. You can get this done with base R but this may be easier.
library(lubridate)
library(dplyr)
df <- data.frame(
duration_string = c("00:00:03","00:00:06","00:12:00","00:31:00","01:12:01")
)
df <- df %>%
mutate(
duration = as.duration(hms(duration_string)),
cat_duration = case_when(
duration < dseconds(5) ~ "less than 5 secs",
duration >= dseconds(5) & duration < dminutes(30) ~ "between 5 secs and 30 mins",
duration >= dminutes(30) & duration < dhours(1) ~ "between 30 mins and 1 hour",
duration > dhours(1) ~ "more than 1 hour",
) ,
cat_duration = factor(cat_duration,levels = c("less than 5 secs",
"between 5 secs and 30 mins",
"between 30 mins and 1 hour",
"more than 1 hour"
))
)
CodePudding user response:
We can use factor
. This only uses base R:
labs <- c('Less than 5 minutes',
'Between 5 and 30 minutes',
'More than 30 minutes and less than 2h',
'2h and more')
transform(df, factor = {
hms <- substr(duration_string, 1, 8)
factor((hms >= "00:00:05") (hms > "00:30:00") (hms >= "02:00:00"), 0:3, labs)
})