I have a dataframe where choices were made sequentially inside sessions. I would like to create a variable indicating the order number of every choice. The problem is that I know only what was the first choice in every session, and I want to know the order of every choice.
So let say we have a choice and a signal telling us if this was the first choice in a session a not. Let assume also that the data is ordered. What I would like is to obtain a third column (order) indicating the choice order, so that every time we have a 1, the order is 1, and it is going up (2, 3,...) until the next 1.
df = data.frame(
choice = c('a','a','b','e','a','l','d','a'),
signal = c(1,0,0,1,0,0,0,0),
order = c(1,2,3,1,2,3,4,5))
choice signal order
1 a 1 1
2 a 0 2
3 b 0 3
4 e 1 1
5 a 0 2
6 l 0 3
7 d 0 4
8 a 0 5
So I try to solve that with map, but it did not work for an obvious reason: I don't know how to update a vector outside of the map.
my_order = df$signal
map(
.x = seq(1,(df$signal %>% length())),
.f = function(x) {
my_order[x] = ifelse(my_order[x]==1, my_order[x], my_order[x-1] 1)
my_order})
Any idea how can I perform that with map? with something else? I am trying to avoid for i.
CodePudding user response:
You can use ave
and create a sequence seq_along
out of groups defined by cumsum(signal == 1)
(or just cumsum(signal)
since it's only 0-1 values; as pointed out by @philliptomk).
df$order <- with(df, ave(signal, cumsum(signal == 1), FUN = seq_along))
df
# choice signal order
# 1 a 1 1
# 2 a 0 2
# 3 b 0 3
# 4 e 1 1
# 5 a 0 2
# 6 l 0 3
# 7 d 0 4
# 8 a 0 5
or use group_by
and row_number
from dplyr
:
library(dplyr)
df %>%
group_by(gp = cumsum(signal == 1)) %>%
mutate(order = row_number())
or use data.table::rowid
:
data.table::rowid(cumsum(df$signal == 1))
CodePudding user response:
You can use the split-apply-combine
strategy:
df <- unsplit(lapply(split(df,
cumsum(df$signal) # split according to cumulativesum
),function(x) {
x$order = c(1:nrow(x))
return(x)}
),
cumsum(df$signal) # reattach the splits to single dataframe
)
# choice signal order
# 1 a 1 1
# 2 a 0 2
# 3 b 0 3
# 4 e 1 1
# 5 a 0 2
# 6 l 0 3
# 7 d 0 4
# 8 a 0 5
CodePudding user response:
Assuming signal is every time 1 in the first row: Using rle
.
df$order <- sequence(rle(cumsum(df$signal))$length)
df
# choice signal order
#1 a 1 1
#2 a 0 2
#3 b 0 3
#4 e 1 1
#5 a 0 2
#6 l 0 3
#7 d 0 4
#8 a 0 5
Or with which
and diff
.
sequence(diff(c(which(df$signal==1)-1, nrow(df))))
#[1] 1 2 3 1 2 3 4 5
CodePudding user response:
Another possible solution, based on purrr::reduce
:
library(tidyverse)
df$order2 <- reduce(df$signal, ~ if (.y == 0) {c(.x, .x[length(.x)] 1)}
else {c(.x, 1)})
df
#> choice signal order order2
#> 1 a 1 1 1
#> 2 a 0 2 2
#> 3 b 0 3 3
#> 4 e 1 1 1
#> 5 a 0 2 2
#> 6 l 0 3 3
#> 7 d 0 4 4
#> 8 a 0 5 5
Yet another possible solution, based on dplyr
:
library(dplyr)
df %>%
group_by(aux = data.table::rleid(signal)) %>%
mutate(order2 = ifelse(signal == 0, 1 row_number(), signal)) %>%
ungroup %>%
select(-aux)
#> # A tibble: 8 × 4
#> choice signal order order2
#> <chr> <dbl> <dbl> <dbl>
#> 1 a 1 1 1
#> 2 a 0 2 2
#> 3 b 0 3 3
#> 4 e 1 1 1
#> 5 a 0 2 2
#> 6 l 0 3 3
#> 7 d 0 4 4
#> 8 a 0 5 5