I have binary data in a dataframe with a time feature and I'm looking to produce a dataframe like below with a new column "duration since =1". I was able to find the python equivalent of this answer here. I am looking for a way to do this in R
Binary Output Time (secs) duration since =1
0 0 0
0 0.000983 0.000983
0 0.001966 0.001966
1 0.002949 0
0 0.003932 0.000983 # (0.003932-0.002949)
0 0.005000 0.002051 # (0.005000-0.002949)
CodePudding user response:
We can use cumsum
to indicate whether we should subtract Time
with Binary_Output
== 1. If cumsum
== 0, it means all previous Binary_Output
has a value of 0, and we will not subtract Time
with Binary_Output
== 1 in these rows.
library(dplyr)
df <- read.table(header = T, text = "Binary_Output Time
0 0
0 0.000983
0 0.001966
1 0.002949
0 0.003932
0 0.005000")
df %>%
mutate(duration = ifelse(cumsum(Binary_Output) == 0, Time, Time - Time[Binary_Output == 1]))
#> Binary_Output Time duration
#> 1 0 0.000000 0.000000
#> 2 0 0.000983 0.000983
#> 3 0 0.001966 0.001966
#> 4 1 0.002949 0.000000
#> 5 0 0.003932 0.000983
#> 6 0 0.005000 0.002051
Created on 2022-05-05 by the reprex package (v2.0.1)
CodePudding user response:
With data.table
:
library(data.table)
setDT(df)
df[,DurationSince1:=Time-nafill(fifelse(Binary_Output==1,Time,NA),type = 'locf')][]
Binary_Output Time DurationSince1
<int> <num> <num>
1: 0 0.000000 NA
2: 0 0.000983 NA
3: 0 0.001966 NA
4: 1 0.002949 0.000000
5: 0 0.003932 0.000983
6: 0 0.005000 0.002051