I need to create a new column to categorize my experiment. The timeseries data is divided in 10 minutes group using cut. I want to add a column that loops between 3 labels say A, B, C and the goes back to A.
test<- tibble(date_time = seq.POSIXt(ymd_hms('2020-02-02 00:00:00'),
ymd_hms('2020-02-02 01:00:00'),
by= '30 sec' ),
cat = cut(date_time, breaks = '10 min'))
I want to get something like this
date_time cat
<dttm> <fct>
1 2020-02-02 00:00:00 A
2 2020-02-02 00:05:30 A
3 2020-02-02 00:10:00 B
4 2020-02-02 00:20:30 C
5 2020-02-02 00:30:00 A
6 2020-02-02 00:31:30 A
I have used the labels option in cut before with a known number of factors but not like this. Any help is welcome
CodePudding user response:
You could use cut(labels = F)
to create numeric labels for your intervals, then use these as an index into the built-in LETTERS
vector (or a vector of custom labels). The modulo operator and a little arithmetic will make it cycle through A, B, and C:
library(tidyverse)
library(lubridate)
test<- tibble(date_time = seq.POSIXt(ymd_hms('2020-02-02 00:00:00'),
ymd_hms('2020-02-02 01:00:00'),
by= '30 sec' ),
cat_num = cut(date_time, breaks = '10 min', labels = F),
cat = LETTERS[((cat_num - 1) %% 3) 1]
)
date_time cat_num cat
<dttm> <int> <chr>
1 2020-02-02 00:00:00 1 A
2 2020-02-02 00:00:30 1 A
3 2020-02-02 00:01:00 1 A
4 2020-02-02 00:01:30 1 A
5 2020-02-02 00:02:00 1 A
6 2020-02-02 00:02:30 1 A
7 2020-02-02 00:03:00 1 A
8 2020-02-02 00:03:30 1 A
9 2020-02-02 00:04:00 1 A
10 2020-02-02 00:04:30 1 A
...48 more rows...
59 2020-02-02 00:29:00 3 C
60 2020-02-02 00:29:30 3 C
61 2020-02-02 00:30:00 4 A
62 2020-02-02 00:30:30 4 A
...59 more rows...