I am pretty new to R so I was trying to figure out how can I do this better. I have a data table that consist of two columns, (Day and Sleepstatus). How can I find the first iteration of Sleeping and Awake based on the column day and mutate another column to indicate when the person start sleep (1st row of sleeping) and stop sleep (1st row of awake). The rest of the duration of sleeping, the column should show N.A.
Day | SleepStatus |
---|---|
1 | Sleeping |
1 | Sleeping |
1 | Sleeping |
1 | Awake |
2 | Sleeping |
2 | Sleeping |
2 | Sleeping |
2 | Awake |
Desired Output
Day | SleepStatus | Final Status |
---|---|---|
1 | Sleeping | Start Sleep |
1 | Sleeping | NA |
1 | Sleeping | NA |
1 | Awake | Stop Sleep |
2 | Sleeping | Start Sleep |
2 | Sleeping | NA |
2 | Sleeping | NA |
2 | Awake | Stop Sleep |
CodePudding user response:
This is one potential solution:
library(data.table)
dt <- data.table::data.table(
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
SleepStatus = c("Sleeping","Sleeping","Sleeping",
"Awake","Sleeping","Sleeping","Sleeping","Awake")
)
dt[, `Final Status` := {ifelse(
cumsum(SleepStatus != "Sleeping") != shift(cumsum(SleepStatus != "Sleeping"), fill = 0, type = "lag"),
"Stop Sleep", "Start Sleep")}]
dt[, `Final Status` := {ifelse(
`Final Status` == shift(`Final Status`, fill = "NA", type = "lag"),
NA, `Final Status`)}]
dt
#> Day SleepStatus Final Status
#> 1: 1 Sleeping Start Sleep
#> 2: 1 Sleeping <NA>
#> 3: 1 Sleeping <NA>
#> 4: 1 Awake Stop Sleep
#> 5: 2 Sleeping Start Sleep
#> 6: 2 Sleeping <NA>
#> 7: 2 Sleeping <NA>
#> 8: 2 Awake Stop Sleep
The code makes a lot more sense if you break it down into smaller chunks. I've done this with tidyverse functions below, as I feel it's easier to understand, but I can change it to data.table syntax if you would like me to.
library(data.table)
dt <- data.table::data.table(
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
SleepStatus = c("Sleeping","Sleeping","Sleeping",
"Awake","Sleeping","Sleeping","Sleeping","Awake")
)
library(tidyverse)
df <- as.data.frame(dt)
# When the Sleepstatus is not "Sleeping", increment the variable by one
df2 <- df %>%
mutate(Sleeping = cumsum(SleepStatus != "Sleeping"))
df2
#> Day SleepStatus Sleeping
#> 1 1 Sleeping 0
#> 2 1 Sleeping 0
#> 3 1 Sleeping 0
#> 4 1 Awake 1
#> 5 2 Sleeping 1
#> 6 2 Sleeping 1
#> 7 2 Sleeping 1
#> 8 2 Awake 2
# If the previous value in "Sleeping" is different to the current value,
# add the "stop sleeping" flag (i.e. show when "Sleeping" changes)
df3 <- df2 %>%
mutate(Sleep_label = ifelse(Sleeping != lag(Sleeping, default = 0), "Stop sleeping", "Start sleeping"))
df3
#> Day SleepStatus Sleeping Sleep_label
#> 1 1 Sleeping 0 Start sleeping
#> 2 1 Sleeping 0 Start sleeping
#> 3 1 Sleeping 0 Start sleeping
#> 4 1 Awake 1 Stop sleeping
#> 5 2 Sleeping 1 Start sleeping
#> 6 2 Sleeping 1 Start sleeping
#> 7 2 Sleeping 1 Start sleeping
#> 8 2 Awake 2 Stop sleeping
# Then, if the value in Sleep_label is equal to the previous label,
# change it to NA
df4 <- df3 %>%
mutate(Final_status = ifelse(Sleep_label == lag(Sleep_label, default = "NA"), NA, Sleep_label))
df4
#> Day SleepStatus Sleeping Sleep_label Final_status
#> 1 1 Sleeping 0 Start sleeping Start sleeping
#> 2 1 Sleeping 0 Start sleeping <NA>
#> 3 1 Sleeping 0 Start sleeping <NA>
#> 4 1 Awake 1 Stop sleeping Stop sleeping
#> 5 2 Sleeping 1 Start sleeping Start sleeping
#> 6 2 Sleeping 1 Start sleeping <NA>
#> 7 2 Sleeping 1 Start sleeping <NA>
#> 8 2 Awake 2 Stop sleeping Stop sleeping
Created on 2022-05-20 by the reprex package (v2.0.1)
Does that make sense? Or did I just make things more confusing?
CodePudding user response:
in Base R you could do the following:
x <- dt$SleepStatus
is.na(x) <- -cumsum(c(1,head(rle(x)$lengths,-1)))
dt$final_status <- c(Sleeping = 'Start Sleep', Awake = 'Stop Sleep')[x]
dt
Day SleepStatus final_status
1 1 Sleeping Start Sleep
2 1 Sleeping <NA>
3 1 Sleeping <NA>
4 1 Awake Stop Sleep
5 2 Sleeping Start Sleep
6 2 Sleeping <NA>
7 2 Sleeping <NA>
8 2 Awake Stop Sleep