I have a data frame with two columns that I want to cross tabulate. The data also includes the counts for the combination. I am trying to create the cross table and include those counts within the table. I am struggling to use the counts from the dataframe into the cross table.
> df %>% arrange(d1)%>% head()
count d1 d2
1 3 1 15
2 86 1 14
3 13 1 12
4 186 1 16
5 29 1 9
6 86 1 13
> table(df$d1,df$d2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Expecting [1,15] and [1,14] to show 3, 86 based on the counts in df table. Right now it shows 0s and 1s only based on if the combinations exists.
Here is my sample data:
structure(list(count = c(37L, 6L, 44L, 21L, 8L, 3L, 9L, 17L,
13L, 32L, 106L, 34L, 505L, 173L, 12L, 2L, 4L, 45L, 3L, 43L, 5L,
16L, 1L, 27L, 17L, 3L, 4L, 1L, 27L, 86L, 79L, 10L, 161L, 32L,
3L, 209L, 9L, 83L, 23L, 108L, 161L, 22L, 4L, 16L, 2L, 6L, 67L,
86L, 3L, 1L, 14L, 14L, 111L, 5L, 5L, 44L, 105L, 13L, 269L, 186L,
3L, 5L, 5L, 27L, 3L, 186L, 58L, 29L, 34L, 43L, 8L, 92L, 9L, 455L,
22L, 32L, 4L, 14L, 58L, 22L, 190L, 94L, 27L, 152L, 264L, 36L,
1L, 505L, 86L, 44L, 3L, 1L, 79L, 75L, 12L, 32L, 11L, 197L, 90L,
269L, 9L, 6L, 47L, 14L, 158L, 303L, 335L, 37L, 33L, 3L, 83L,
15L, 31L, 124L, 146L, 26L, 36L, 27L, 37L, 31L, 108L, 121L, 111L,
11L, 5L, 26L, 166L, 11L, 18L, 11L, 8L, 15L, 18L, 165L, 80L, 14L,
5L, 3L, 492L, 7L, 90L, 146L, 130L, 197L, 165L, 34L, 22L, 122L,
29L, 74L, 455L, 303L, 45L, 5L, 173L, 33L, 24L, 229L, 79L, 43L,
68L, 16L, 10L, 73L, 35L, 99L, 229L, 94L, 23L, 492L, 18L, 84L,
92L, 86L, 35L, 31L, 1L, 23L, 8L, 121L, 1L, 173L, 400L, 124L,
20L, 11L, 6L, 3L, 166L, 84L, 31L, 122L, 15L, 24L, 70L, 43L, 74L,
209L, 45L, 158L, 44L, 15L, 37L, 35L, 27L, 68L, 20L, 15L, 11L,
21L, 4L, 18L, 44L, 234L, 80L, 10L, 44L, 4L, 47L, 7L, 67L, 10L,
3L, 173L, 99L, 79L, 130L, 3L, 75L, 1L, 335L, 14L, 106L, 15L,
34L, 190L, 152L, 16L, 73L, 45L, 1L, 3L, 264L, 160L, 23L, 1L,
160L, 400L, 105L, 234L, 70L, 35L), d1 = c(10L, 17L, 5L, 3L, 12L,
1L, 10L, 10L, 12L, 7L, 14L, 6L, 16L, 3L, 7L, 9L, 7L, 13L, 4L,
8L, 9L, 2L, 7L, 16L, 8L, 15L, 12L, 12L, 2L, 1L, 16L, 15L, 14L,
5L, 8L, 14L, 11L, 11L, 4L, 4L, 13L, 7L, 12L, 11L, 17L, 8L, 4L,
13L, 15L, 15L, 12L, 13L, 4L, 5L, 5L, 5L, 2L, 1L, 2L, 1L, 2L,
13L, 12L, 5L, 3L, 16L, 10L, 1L, 14L, 2L, 7L, 9L, 15L, 16L, 3L,
11L, 8L, 12L, 9L, 9L, 14L, 11L, 8L, 11L, 16L, 10L, 17L, 6L, 1L,
3L, 5L, 1L, 3L, 11L, 10L, 14L, 5L, 3L, 6L, 16L, 15L, 15L, 4L,
14L, 14L, 16L, 16L, 8L, 3L, 7L, 1L, 15L, 6L, 11L, 6L, 5L, 1L,
15L, 2L, 7L, 14L, 2L, 13L, 10L, 6L, 1L, 3L, 15L, 2L, 3L, 9L,
7L, 11L, 3L, 10L, 16L, 17L, 7L, 3L, 15L, 1L, 2L, 10L, 13L, 4L,
5L, 8L, 4L, 9L, 16L, 13L, 4L, 10L, 17L, 6L, 8L, 7L, 11L, 8L,
9L, 16L, 7L, 14L, 9L, 4L, 3L, 13L, 4L, 8L, 16L, 8L, 6L, 14L,
14L, 9L, 13L, 17L, 12L, 10L, 1L, 17L, 11L, 16L, 2L, 1L, 7L, 14L,
12L, 2L, 9L, 8L, 6L, 4L, 13L, 9L, 6L, 5L, 6L, 12L, 11L, 4L, 2L,
14L, 12L, 11L, 7L, 8L, 6L, 1L, 12L, 9L, 12L, 5L, 3L, 6L, 15L,
13L, 8L, 10L, 4L, 1L, 13L, 17L, 13L, 1L, 10L, 14L, 17L, 9L, 2L,
10L, 17L, 2L, 12L, 5L, 3L, 6L, 7L, 3L, 16L, 15L, 5L, 9L, 2L,
6L, 5L, 13L, 11L, 4L, 6L, 13L, 4L), d2 = c(2L, 14L, 4L, 12L,
10L, 15L, 15L, 8L, 1L, 14L, 2L, 5L, 6L, 11L, 10L, 17L, 8L, 10L,
17L, 6L, 5L, 7L, 15L, 15L, 10L, 1L, 9L, 17L, 5L, 14L, 8L, 14L,
13L, 11L, 5L, 6L, 15L, 1L, 8L, 14L, 14L, 3L, 8L, 7L, 9L, 15L,
1L, 1L, 2L, 5L, 13L, 12L, 13L, 12L, 9L, 3L, 4L, 12L, 16L, 16L,
15L, 17L, 5L, 2L, 17L, 1L, 9L, 9L, 5L, 9L, 9L, 14L, 11L, 13L,
7L, 5L, 12L, 14L, 10L, 8L, 3L, 4L, 11L, 6L, 9L, 1L, 1L, 16L,
13L, 5L, 8L, 17L, 10L, 9L, 7L, 7L, 10L, 13L, 1L, 2L, 10L, 8L,
10L, 12L, 11L, 4L, 10L, 14L, 8L, 12L, 11L, 6L, 7L, 2L, 2L, 1L,
10L, 16L, 10L, 6L, 4L, 1L, 4L, 5L, 17L, 5L, 2L, 3L, 8L, 15L,
7L, 4L, 12L, 4L, 6L, 17L, 6L, 5L, 16L, 4L, 6L, 6L, 14L, 3L, 3L,
14L, 9L, 6L, 1L, 5L, 16L, 16L, 13L, 13L, 13L, 3L, 13L, 13L, 16L,
2L, 7L, 2L, 15L, 3L, 12L, 1L, 11L, 11L, 4L, 3L, 2L, 9L, 9L, 1L,
4L, 8L, 12L, 6L, 12L, 2L, 2L, 3L, 11L, 11L, 8L, 1L, 17L, 7L,
3L, 6L, 13L, 4L, 7L, 7L, 13L, 8L, 16L, 14L, 16L, 14L, 5L, 12L,
8L, 4L, 8L, 16L, 1L, 15L, 7L, 3L, 12L, 11L, 13L, 6L, 10L, 13L,
5L, 7L, 4L, 15L, 4L, 15L, 4L, 6L, 3L, 3L, 10L, 3L, 11L, 17L,
16L, 16L, 14L, 2L, 6L, 14L, 11L, 11L, 9L, 12L, 7L, 7L, 16L, 13L,
12L, 15L, 2L, 16L, 2L, 3L, 9L, 9L)), row.names = c(NA, 252L), class = "data.frame")
CodePudding user response:
xtabs
may be useful here
> xtabs(count ~ d1 d2, df)
d2
d1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2 121 0 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3 99 166 0 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4 67 105 165 0 44 122 15 23 35 47 94 35 111 108 7 303 3
5 26 27 44 44 0 34 3 3 5 11 32 5 44 34 1 74 0
6 90 146 234 122 34 0 31 43 84 80 152 23 173 209 15 505 5
7 11 16 22 15 3 31 0 4 8 12 16 3 24 32 1 68 0
8 20 18 33 23 3 43 4 0 22 17 27 4 31 37 6 79 0
9 29 43 73 35 5 84 8 22 0 58 75 4 70 92 0 264 2
10 36 37 79 47 11 80 12 17 58 0 0 8 45 130 9 335 0
11 83 124 173 94 32 152 16 27 75 0 0 18 229 158 9 400 0
12 13 15 21 35 5 23 3 4 4 8 18 0 14 14 0 45 1
13 86 160 197 111 44 173 24 31 70 45 229 14 0 161 10 455 5
14 86 106 190 108 34 209 32 37 92 130 158 14 161 0 10 0 6
15 3 3 11 7 1 15 1 6 0 9 9 0 10 10 0 27 0
16 186 269 492 303 74 505 68 79 264 335 400 45 455 0 27 0 14
17 1 1 3 3 0 5 0 0 2 0 0 1 5 6 0 14 0
Convert to data.frame
if required
as.data.frame.matrix(xtabs(count ~ d1 d2, df))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 0 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2 121 0 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3 99 166 0 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4 67 105 165 0 44 122 15 23 35 47 94 35 111 108 7 303 3
5 26 27 44 44 0 34 3 3 5 11 32 5 44 34 1 74 0
6 90 146 234 122 34 0 31 43 84 80 152 23 173 209 15 505 5
7 11 16 22 15 3 31 0 4 8 12 16 3 24 32 1 68 0
8 20 18 33 23 3 43 4 0 22 17 27 4 31 37 6 79 0
9 29 43 73 35 5 84 8 22 0 58 75 4 70 92 0 264 2
10 36 37 79 47 11 80 12 17 58 0 0 8 45 130 9 335 0
11 83 124 173 94 32 152 16 27 75 0 0 18 229 158 9 400 0
12 13 15 21 35 5 23 3 4 4 8 18 0 14 14 0 45 1
13 86 160 197 111 44 173 24 31 70 45 229 14 0 161 10 455 5
14 86 106 190 108 34 209 32 37 92 130 158 14 161 0 10 0 6
15 3 3 11 7 1 15 1 6 0 9 9 0 10 10 0 27 0
16 186 269 492 303 74 505 68 79 264 335 400 45 455 0 27 0 14
17 1 1 3 3 0 5 0 0 2 0 0 1 5 6 0 14 0
Or may use dcast
library(data.table)
dcast(df, d1 ~ d2, value.var = 'count')
Key: <d1>
d1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1: 1 NA 121 99 67 26 90 11 20 29 36 83 13 86 86 3 186 1
2: 2 121 NA 166 105 27 146 16 18 43 37 124 15 160 106 3 269 1
3: 3 99 166 NA 165 44 234 22 33 73 79 173 21 197 190 11 492 3
4: 4 67 105 165 NA 44 122 15 23 35 47 94 35 111 108 7 303 3
5: 5 26 27 44 44 NA 34 3 3 5 11 32 5 44 34 1 74 NA
6: 6 90 146 234 122 34 NA 31 43 84 80 152 23 173 209 15 505 5
7: 7 11 16 22 15 3 31 NA 4 8 12 16 3 24 32 1 68 NA
8: 8 20 18 33 23 3 43 4 NA 22 17 27 4 31 37 6 79 NA
9: 9 29 43 73 35 5 84 8 22 NA 58 75 4 70 92 NA 264 2
10: 10 36 37 79 47 11 80 12 17 58 NA NA 8 45 130 9 335 NA
11: 11 83 124 173 94 32 152 16 27 75 NA NA 18 229 158 9 400 NA
12: 12 13 15 21 35 5 23 3 4 4 8 18 NA 14 14 NA 45 1
13: 13 86 160 197 111 44 173 24 31 70 45 229 14 NA 161 10 455 5
14: 14 86 106 190 108 34 209 32 37 92 130 158 14 161 NA 10 NA 6
15: 15 3 3 11 7 1 15 1 6 NA 9 9 NA 10 10 NA 27 NA
16: 16 186 269 492 303 74 505 68 79 264 335 400 45 455 NA 27 NA 14
17: 17 1 1 3 3 NA 5 NA NA 2 NA NA 1 5 6 NA 14 NA