Home > Software engineering >  Inferring choice order in sequential data, in R
Inferring choice order in sequential data, in R

Time:05-02

I have a dataframe where choices were made sequentially inside sessions. I would like to create a variable indicating the order number of every choice. The problem is that I know only what was the first choice in every session, and I want to know the order of every choice.

So let say we have a choice and a signal telling us if this was the first choice in a session a not. Let assume also that the data is ordered. What I would like is to obtain a third column (order) indicating the choice order, so that every time we have a 1, the order is 1, and it is going up (2, 3,...) until the next 1.

df = data.frame(
  choice = c('a','a','b','e','a','l','d','a'),
  signal = c(1,0,0,1,0,0,0,0),
  order = c(1,2,3,1,2,3,4,5))

  choice signal order
1      a      1     1
2      a      0     2
3      b      0     3
4      e      1     1
5      a      0     2
6      l      0     3
7      d      0     4
8      a      0     5

So I try to solve that with map, but it did not work for an obvious reason: I don't know how to update a vector outside of the map.

my_order = df$signal
map(
  .x = seq(1,(df$signal %>% length())),
  .f = function(x) {
    my_order[x] = ifelse(my_order[x]==1, my_order[x], my_order[x-1] 1)
    my_order})

Any idea how can I perform that with map? with something else? I am trying to avoid for i.

CodePudding user response:

You can use ave and create a sequence seq_along out of groups defined by cumsum(signal == 1) (or just cumsum(signal) since it's only 0-1 values; as pointed out by @philliptomk).

df$order <- with(df, ave(signal, cumsum(signal == 1), FUN = seq_along))

df
#   choice signal order
# 1      a      1     1
# 2      a      0     2
# 3      b      0     3
# 4      e      1     1
# 5      a      0     2
# 6      l      0     3
# 7      d      0     4
# 8      a      0     5

or use group_by and row_number from dplyr:

library(dplyr)
df %>% 
  group_by(gp = cumsum(signal == 1)) %>% 
  mutate(order = row_number())

or use data.table::rowid:

data.table::rowid(cumsum(df$signal == 1))

CodePudding user response:

You can use the split-apply-combine strategy:


df <- unsplit(lapply(split(df,
                     cumsum(df$signal) # split according to cumulativesum
                     ),function(x) {
                        x$order = c(1:nrow(x))
                        return(x)}
               ),
        cumsum(df$signal) # reattach the splits to single dataframe
      )
# choice signal order
# 1      a      1     1
# 2      a      0     2
# 3      b      0     3
# 4      e      1     1
# 5      a      0     2
# 6      l      0     3
# 7      d      0     4
# 8      a      0     5

CodePudding user response:

Assuming signal is every time 1 in the first row: Using rle.

df$order <- sequence(rle(cumsum(df$signal))$length)
df
#  choice signal order
#1      a      1     1
#2      a      0     2
#3      b      0     3
#4      e      1     1
#5      a      0     2
#6      l      0     3
#7      d      0     4
#8      a      0     5

Or with which and diff.

sequence(diff(c(which(df$signal==1)-1, nrow(df))))
#[1] 1 2 3 1 2 3 4 5

CodePudding user response:

Another possible solution, based on purrr::reduce:

library(tidyverse)

df$order2 <- reduce(df$signal, ~ if (.y == 0) {c(.x, .x[length(.x)] 1)} 
                    else {c(.x, 1)})

df

#>   choice signal order order2
#> 1      a      1     1      1
#> 2      a      0     2      2
#> 3      b      0     3      3
#> 4      e      1     1      1
#> 5      a      0     2      2
#> 6      l      0     3      3
#> 7      d      0     4      4
#> 8      a      0     5      5

Yet another possible solution, based on dplyr:

library(dplyr)

df %>% 
  group_by(aux = data.table::rleid(signal)) %>% 
  mutate(order2 = ifelse(signal == 0, 1   row_number(), signal)) %>% 
  ungroup %>% 
  select(-aux)

#> # A tibble: 8 × 4
#>   choice signal order order2
#>   <chr>   <dbl> <dbl>  <dbl>
#> 1 a           1     1      1
#> 2 a           0     2      2
#> 3 b           0     3      3
#> 4 e           1     1      1
#> 5 a           0     2      2
#> 6 l           0     3      3
#> 7 d           0     4      4
#> 8 a           0     5      5
  • Related