I am trying to find the "matrix index" from values of dataframe in the Position
column. The "matrix" that I would like to reference to is either a 3 x 3 or 4 x 4 matrix, depending on the length of the Position
column (1:9
for 3 x 3 and 1:16
for 4 x 4). Different groups in col1
would have different length of Position
.
Here's a dummy dataframe to demonstrate my problem.
df <- structure(list(col1 = c("group1", "group1", "group1", "group1",
"group1", "group1", "group1", "group1", "group1", "group2", "group2",
"group2", "group2", "group2", "group2", "group2", "group2", "group2",
"group2", "group2", "group2", "group2", "group2", "group2", "group2",
"group3", "group3", "group3", "group3", "group3", "group3", "group3",
"group3", "group3", "group3", "group3", "group3", "group3"),
col2 = c("A", "Q", NA, "A", "K", "L", "O", "R", "J", "S",
"C", "S", "H", "O", "T", "Z", "D", "Y", "J", "V", "Z", "P",
"L", "X", "D", "K", "M", "X", "E", "P", "U", "Z", "Z", "L",
"W", "X", "F", "K"), Position = c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -38L))
Rules
From this dataframe, I would like to get a new column Position_ij
specifying the ith and jth Position
if it were in a matrix.
For example, "group1" has a Position
of length 9, therefore, it should reference a 3 x 3 matrix, and the Position_ij
should be 1 = "[1, 1]", 2 = "[1, 2]", 3 = "[1, 3]", 4 = "[2, 1]" ..., 9 = "[3, 3]".
For "group2", it has a Position
length of 16, therefore it should reference a 4 x 4 matrix, and the Position_ij
should be 1 = "[1, 1]", ..., 4 = "[1, 4]", 5 = "[2, 1]" ..., 16 = "[4, 4]".
For "group3", it has a Position
length of 13, which is greater than 9, therefore it should reference a 4 x 4 matrix.
Current attempt (failed)
My current method uses %/%
and %%
to get the quotient and remainder of Position
divided by matrix length, however, when Position
== matrix length, the remainder is 0 but instead I want 3 or 4.
library(dplyr)
df %>% group_by(col1) %>%
mutate(Position_ij = if (n() == 9) {
paste0("[", (Position %/% 3) 1, ", ", Position %% 3, "]")
} else {
paste0("[", (Position %/% 4) 1, ", ", Position %% 4, "]")
})
# A tibble: 38 × 4
# Groups: col1 [3]
col1 col2 Position Position_ij
<chr> <chr> <int> <chr>
1 group1 A 1 [1, 1]
2 group1 Q 2 [1, 2]
3 group1 NA 3 [2, 0] # this should be [1, 3]
4 group1 A 4 [2, 1]
5 group1 K 5 [2, 2]
6 group1 L 6 [3, 0] # this should be [2, 3]
7 group1 O 7 [3, 1]
8 group1 R 8 [3, 2]
9 group1 J 9 [4, 0] # this should be [3, 3]
10 group2 S 1 [1, 1]
# … with 28 more rows
Desired output
col1 col2 Position Position_ij
<chr> <chr> <int> <chr>
1 group1 A 1 [1, 1]
2 group1 Q 2 [1, 2]
3 group1 NA 3 [1, 3]
4 group1 A 4 [2, 1]
5 group1 K 5 [2, 2]
6 group1 L 6 [2, 3]
7 group1 O 7 [3, 1]
8 group1 R 8 [3, 2]
9 group1 J 9 [3, 3]
10 group2 S 1 [1, 1]
11 group2 C 2 [1, 2]
12 group2 S 3 [1, 3]
13 group2 H 4 [1, 4]
14 group2 O 5 [2, 1]
15 group2 T 6 [2, 2]
16 group2 Z 7 [2, 3]
17 group2 D 8 [2, 4]
18 group2 Y 9 [3, 1]
19 group2 J 10 [3, 2]
20 group2 V 11 [3, 3]
21 group2 Z 12 [3, 4]
22 group2 P 13 [4, 1]
23 group2 L 14 [4, 2]
24 group2 X 15 [4, 3]
25 group2 D 16 [4, 4]
26 group3 K 1 [1, 1]
27 group3 M 2 [1, 2]
28 group3 X 3 [1, 3]
29 group3 E 4 [1, 4]
30 group3 P 5 [2, 1]
31 group3 U 6 [2, 2]
32 group3 Z 7 [2, 3]
33 group3 Z 8 [2, 4]
34 group3 L 9 [3, 1]
35 group3 W 10 [3, 2]
36 group3 X 11 [3, 3]
37 group3 F 12 [3, 4]
38 group3 K 13 [4, 1]
FYI, my reference matrix should either be 9 x 9 or 10 x 10 in reality.
CodePudding user response:
Subtract 1 from 'Position' before the %/%
/%%
and then add 1 on the result
library(dplyr)
out <- df %>%
group_by(col1) %>%
mutate(Position_ij = if (n() == 9) {
paste0("[", ((Position-1) %/% 3) 1, ", ", (Position-1) %% 3 1, "]")
} else {
paste0("[", ((Position-1) %/% 4) 1, ", ", (Position-1) %% 4 1, "]")
}) %>%
ungroup
-output
> as.data.frame(out)
col1 col2 Position Position_ij
1 group1 A 1 [1, 1]
2 group1 Q 2 [1, 2]
3 group1 <NA> 3 [1, 3]
4 group1 A 4 [2, 1]
5 group1 K 5 [2, 2]
6 group1 L 6 [2, 3]
7 group1 O 7 [3, 1]
8 group1 R 8 [3, 2]
9 group1 J 9 [3, 3]
10 group2 S 1 [1, 1]
11 group2 C 2 [1, 2]
12 group2 S 3 [1, 3]
13 group2 H 4 [1, 4]
14 group2 O 5 [2, 1]
15 group2 T 6 [2, 2]
16 group2 Z 7 [2, 3]
17 group2 D 8 [2, 4]
18 group2 Y 9 [3, 1]
19 group2 J 10 [3, 2]
20 group2 V 11 [3, 3]
21 group2 Z 12 [3, 4]
22 group2 P 13 [4, 1]
23 group2 L 14 [4, 2]
24 group2 X 15 [4, 3]
25 group2 D 16 [4, 4]
26 group3 K 1 [1, 1]
27 group3 M 2 [1, 2]
28 group3 X 3 [1, 3]
29 group3 E 4 [1, 4]
30 group3 P 5 [2, 1]
31 group3 U 6 [2, 2]
32 group3 Z 7 [2, 3]
33 group3 Z 8 [2, 4]
34 group3 L 9 [3, 1]
35 group3 W 10 [3, 2]
36 group3 X 11 [3, 3]
37 group3 F 12 [3, 4]
38 group3 K 13 [4, 1]
Or use gl/rowid
library(data.table)
out2 <- df %>%
group_by(col1) %>%
mutate(Position_ij = sprintf('[%d, %d]',
as.integer(gl(n(), c(4, 3)[1 !n()%%3], n())),
rowid(as.integer(gl(n(), c(4, 3)[1 !n()%%3], n()))))) %>%
ungroup
-testing
> identical(out2, out)