Home > Software engineering >  Column Values Depend on Non-NA entries in Other Columns
Column Values Depend on Non-NA entries in Other Columns

Time:05-22

I am in need of a way to devise a column (data_sequence) that uses the following logic as based off values in other columns (col_1:col_5).

  1. The column elements in col_1:col_5 contain both 1) values and 2) NA entries.

  2. The flow of construction of the 'data_sequence' column moves from right to left within col_1:col_5 .

  3. Initially, 'data_sequence' assumes the values of the rightmost column (col_5) until the first instance of NA is hit in that column.

  4. col_4 then becomes the relevant column upon which to harvest data. col_4 values are then assumed for 'data_sequence' in corresponding rows until its first instance of NA appears.

The process continues through col_1, at which point 'data_sequence' is fully populated with the appropriate values.

The values present in col_1:col_5 stair-step downward from right to left as this sample data indicates. That is, values in adjacent columns may begin in the same row, but values never begin at a lower row value in the left column of any adjacent pair of columns.

Once 'data_sequence' is populated, I also need a column (column_offset) that provides the column offset relative to the first column (row).

Any solutions, elegant or otherwise, are greatly appreciated.

enter image description here

sixteen_tons <- tibble(row = 1:12,
               col_1 = c( rep(NA, 5),  1:7), 
               col_2 = c( rep(NA, 3) , 15:21, rep(NA, 2) ),
               col_3 = c( rep(NA, 2) , 33:39, rep(NA, 3) ),
               col_4 = c( rep(NA, 2) , 55:59, rep(NA, 5) ),
               col_5 = c( 91:93, rep(NA, 9) ),
               data_sequence = c(91:93, 56:59, 38:39, 21, 6:7),
               column_offset = c(rep(5,3), rep(4,4), rep(3,2), rep(2,1), rep(1,2)   )
)

CodePudding user response:

We could use coalesce with max.col

library(dplyr)
library(purrr)
sixteen_tons %>% 
  mutate(data_sequence2 = invoke(coalesce, 
     rev(across(starts_with('col_')))),
    column_offset2 = max.col(!is.na(across(starts_with('col_'))), 'last'))

-output

# A tibble: 12 × 10
     row col_1 col_2 col_3 col_4 col_5 data_sequence column_offset data_sequence2 column_offset2
   <int> <int> <int> <int> <int> <int>         <dbl>         <dbl>          <int>          <int>
 1     1    NA    NA    NA    NA    91            91             5             91              5
 2     2    NA    NA    NA    NA    92            92             5             92              5
 3     3    NA    NA    33    55    93            93             5             93              5
 4     4    NA    15    34    56    NA            56             4             56              4
 5     5    NA    16    35    57    NA            57             4             57              4
 6     6     1    17    36    58    NA            58             4             58              4
 7     7     2    18    37    59    NA            59             4             59              4
 8     8     3    19    38    NA    NA            38             3             38              3
 9     9     4    20    39    NA    NA            39             3             39              3
10    10     5    21    NA    NA    NA            21             2             21              2
11    11     6    NA    NA    NA    NA             6             1              6              1
12    12     7    NA    NA    NA    NA             7             1              7              1
  •  Tags:  
  • r
  • Related