Home > database >  R output matrix index with values in dataframe
R output matrix index with values in dataframe

Time:05-14

I am trying to find the "matrix index" from values of dataframe in the Position column. The "matrix" that I would like to reference to is either a 3 x 3 or 4 x 4 matrix, depending on the length of the Position column (1:9 for 3 x 3 and 1:16 for 4 x 4). Different groups in col1 would have different length of Position.

Here's a dummy dataframe to demonstrate my problem.

df <- structure(list(col1 = c("group1", "group1", "group1", "group1", 
"group1", "group1", "group1", "group1", "group1", "group2", "group2", 
"group2", "group2", "group2", "group2", "group2", "group2", "group2", 
"group2", "group2", "group2", "group2", "group2", "group2", "group2", 
"group3", "group3", "group3", "group3", "group3", "group3", "group3", 
"group3", "group3", "group3", "group3", "group3", "group3"), 
    col2 = c("A", "Q", NA, "A", "K", "L", "O", "R", "J", "S", 
    "C", "S", "H", "O", "T", "Z", "D", "Y", "J", "V", "Z", "P", 
    "L", "X", "D", "K", "M", "X", "E", "P", "U", "Z", "Z", "L", 
    "W", "X", "F", "K"), Position = c(1L, 2L, 3L, 4L, 5L, 6L, 
    7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
    12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
    9L, 10L, 11L, 12L, 13L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -38L))

Rules

From this dataframe, I would like to get a new column Position_ij specifying the ith and jth Position if it were in a matrix.

For example, "group1" has a Position of length 9, therefore, it should reference a 3 x 3 matrix, and the Position_ij should be 1 = "[1, 1]", 2 = "[1, 2]", 3 = "[1, 3]", 4 = "[2, 1]" ..., 9 = "[3, 3]".

For "group2", it has a Position length of 16, therefore it should reference a 4 x 4 matrix, and the Position_ij should be 1 = "[1, 1]", ..., 4 = "[1, 4]", 5 = "[2, 1]" ..., 16 = "[4, 4]".

For "group3", it has a Position length of 13, which is greater than 9, therefore it should reference a 4 x 4 matrix.

Current attempt (failed)

My current method uses %/% and %% to get the quotient and remainder of Position divided by matrix length, however, when Position == matrix length, the remainder is 0 but instead I want 3 or 4.

library(dplyr)

df %>% group_by(col1) %>% 
  mutate(Position_ij = if (n() == 9) {
      paste0("[", (Position %/% 3)   1, ", ", Position %% 3, "]")
    } else {
      paste0("[", (Position %/% 4)   1, ", ", Position %% 4, "]")
      })
# A tibble: 38 × 4
# Groups:   col1 [3]
   col1   col2  Position Position_ij
   <chr>  <chr>    <int> <chr>      
 1 group1 A            1 [1, 1]     
 2 group1 Q            2 [1, 2]     
 3 group1 NA           3 [2, 0]     # this should be [1, 3]
 4 group1 A            4 [2, 1]     
 5 group1 K            5 [2, 2]     
 6 group1 L            6 [3, 0]     # this should be [2, 3]
 7 group1 O            7 [3, 1]     
 8 group1 R            8 [3, 2]     
 9 group1 J            9 [4, 0]     # this should be [3, 3]
10 group2 S            1 [1, 1]     
# … with 28 more rows

Desired output

   col1   col2  Position Position_ij
   <chr>  <chr>    <int> <chr>      
 1 group1 A            1 [1, 1]     
 2 group1 Q            2 [1, 2]     
 3 group1 NA           3 [1, 3]     
 4 group1 A            4 [2, 1]     
 5 group1 K            5 [2, 2]     
 6 group1 L            6 [2, 3]     
 7 group1 O            7 [3, 1]     
 8 group1 R            8 [3, 2]     
 9 group1 J            9 [3, 3]     
10 group2 S            1 [1, 1]     
11 group2 C            2 [1, 2]     
12 group2 S            3 [1, 3]     
13 group2 H            4 [1, 4]     
14 group2 O            5 [2, 1]     
15 group2 T            6 [2, 2]     
16 group2 Z            7 [2, 3]     
17 group2 D            8 [2, 4]     
18 group2 Y            9 [3, 1]     
19 group2 J           10 [3, 2]     
20 group2 V           11 [3, 3]     
21 group2 Z           12 [3, 4]     
22 group2 P           13 [4, 1]     
23 group2 L           14 [4, 2]     
24 group2 X           15 [4, 3]     
25 group2 D           16 [4, 4]     
26 group3 K            1 [1, 1]     
27 group3 M            2 [1, 2]     
28 group3 X            3 [1, 3]     
29 group3 E            4 [1, 4]     
30 group3 P            5 [2, 1]     
31 group3 U            6 [2, 2]     
32 group3 Z            7 [2, 3]     
33 group3 Z            8 [2, 4]     
34 group3 L            9 [3, 1]     
35 group3 W           10 [3, 2]     
36 group3 X           11 [3, 3]     
37 group3 F           12 [3, 4]     
38 group3 K           13 [4, 1]        

FYI, my reference matrix should either be 9 x 9 or 10 x 10 in reality.

CodePudding user response:

Subtract 1 from 'Position' before the %/%/%% and then add 1 on the result

library(dplyr)
out <- df %>% 
  group_by(col1) %>% 
  mutate(Position_ij = if (n() == 9) {
      paste0("[", ((Position-1) %/% 3)   1, ", ", (Position-1) %% 3   1, "]")
    } else {
      paste0("[", ((Position-1) %/% 4)   1, ", ", (Position-1) %% 4   1, "]")
      }) %>%
  ungroup

-output

> as.data.frame(out)
     col1 col2 Position Position_ij
1  group1    A        1      [1, 1]
2  group1    Q        2      [1, 2]
3  group1 <NA>        3      [1, 3]
4  group1    A        4      [2, 1]
5  group1    K        5      [2, 2]
6  group1    L        6      [2, 3]
7  group1    O        7      [3, 1]
8  group1    R        8      [3, 2]
9  group1    J        9      [3, 3]
10 group2    S        1      [1, 1]
11 group2    C        2      [1, 2]
12 group2    S        3      [1, 3]
13 group2    H        4      [1, 4]
14 group2    O        5      [2, 1]
15 group2    T        6      [2, 2]
16 group2    Z        7      [2, 3]
17 group2    D        8      [2, 4]
18 group2    Y        9      [3, 1]
19 group2    J       10      [3, 2]
20 group2    V       11      [3, 3]
21 group2    Z       12      [3, 4]
22 group2    P       13      [4, 1]
23 group2    L       14      [4, 2]
24 group2    X       15      [4, 3]
25 group2    D       16      [4, 4]
26 group3    K        1      [1, 1]
27 group3    M        2      [1, 2]
28 group3    X        3      [1, 3]
29 group3    E        4      [1, 4]
30 group3    P        5      [2, 1]
31 group3    U        6      [2, 2]
32 group3    Z        7      [2, 3]
33 group3    Z        8      [2, 4]
34 group3    L        9      [3, 1]
35 group3    W       10      [3, 2]
36 group3    X       11      [3, 3]
37 group3    F       12      [3, 4]
38 group3    K       13      [4, 1]

Or use gl/rowid

library(data.table)
out2 <- df %>%
   group_by(col1) %>% 
   mutate(Position_ij = sprintf('[%d, %d]', 
      as.integer(gl(n(), c(4, 3)[1   !n()%%3], n())),
      rowid(as.integer(gl(n(), c(4, 3)[1   !n()%%3], n()))))) %>% 
   ungroup

-testing

> identical(out2, out)
  • Related