Rowwise name of first and last columns for every run of 1-CodePudding

I have a data frame where there a several columns with prefix col:

df <- data.frame(row_id = c(100, 101, 102, 103, 104, 105, 106, 107, 108, 109),
             col1 = c(1,0,1,1,1,0,0,1,1,0),
             col2 = c(1,1,1,0,0,1,1,1,0,0),
             col3 = c(0,0,1,0,0,1,1,1,1,0),
             col4 = c(1,1,1,0,0,1,0,1,1,1),
             col5 = c(1,1,0,0,1,0,1,0,0,1))

I want to be able to create a separate table where the first column is row_id, and for every row, I want to get the names of the first and last column for every run of 1. For example, the output for row_id = 100 would be:

| row_id | first_col | last_col |
| 100    | col1      | col2     |
| 100    | col4      | col5     |

I'll need to apply this method to a much larger data frame, about a thousand columns per row, so any help on that would be much appreaciated.

Thank you!

CodePudding user response：

Remove if (.N > 1) to include length-1 "run"s

df <- data.frame(row_id = c(100, 101, 102, 103, 104, 105, 106, 107, 108, 109),
             col1 = c(1,0,1,1,1,0,0,1,1,0),
             col2 = c(1,1,1,0,0,1,1,1,0,0),
             col3 = c(0,0,1,0,0,1,1,1,1,0),
             col4 = c(1,1,1,0,0,1,0,1,1,1),
             col5 = c(1,1,0,0,1,0,1,0,0,1))

library(data.table)
library(magrittr)

setDT(df) %>% 
  melt('row_id') %>% 
  .[, rid := rleid(value), keyby = row_id] %>% 
  .[value == 1, 
      if (.N > 1)
      .(first_col = first(variable), 
        last_col = last(variable)),
   by = .(row_id, rid)] %>% 
  .[, -'rid']
#>    row_id first_col last_col
#> 1:    100      col1     col2
#> 2:    100      col4     col5
#> 3:    101      col4     col5
#> 4:    102      col1     col4
#> 5:    105      col2     col4
#> 6:    106      col2     col3
#> 7:    107      col1     col4
#> 8:    108      col3     col4
#> 9:    109      col4     col5

^{Created on 2021-11-24 by the reprex package (v2.0.1)}