I already asked a related question here, which was probably not sufficiently specified and thus the answers did not solve my problem. Will try to do it better this time.
I have created dataframe df
:
df <- data.frame(names1=c('mouse','dog','cat','cat','mouse','cat','cat','dog','cat','mouse'), names2=c('cat','dog','dog','mouse','cat','cat','mouse','mouse','mouse','mouse'), values=c(11,5,41,25,101,78,12,41,6,77))
There are purposely no "birds" neither in the names1
nor names2
column.
and the following names vector:
dims <- c('dog','mouse','bird','cat')
and an initially empty matrix:
my_matrix <- matrix(data=0,nrow = length(unique(dims)),ncol = length(unique(dims))) rownames(my_matrix) <- c('dog','mouse','cat', 'bird') colnames(my_matrix) <- c('dog','mouse','cat','bird'))
So the empty matrix look like this:
> dog mouse bird cat
> dog 0 0 0 0
> mouse 0 0 0 0
> bird 0 0 0 0
> cat 0 0 0 0
**The Goal: **
The goal is to populate the empty matrix with the information coming from dataframe df
- Count Matrix
In the first matrix, I want to count the occurrences where the vector elements match. So the first matrix should look exactly like this one (based on df$names1
and df$names2
)
> dog mouse bird cat
> dog 1 1 0 0
> mouse 0 1 0 2
> bird 0 0 0 0
> cat 1 3 0 1
- Sum Matrix
In the second matrix, I want to show the sums provided by column df$values
and based on the vector match element given by df$names1
and df$names2
. So the matrix should look exactly like this one
> dog mouse bird cat
> dog 5 41 0 0
> mouse 0 77 0 112
> bird 0 0 0 0
> cat 0 137 0 25
Restrictions:
there is one restriction regarding the shape of the matrix -
The order of rows and columns is given by the dims vector. The purpose is that the matrix should show that there are no birds in df$names1
and df$names2
My approach:
I tried to place the count and sum into each element of the empty matrix by using a for loop which looks like this:
for(i in 1:nrow(my_matrix)){
for(j in 1:ncol(my_matrix)){
my_matrix[i,j] <- sum(df$values & df$names1[i] == df$names2[j] & df$names1[j] == df$names2[i])
}
}
and which provides me with this undesired result
dog mouse bird cat
dog 0 0 0 10
mouse 0 10 0 0
bird 0 0 0 0
cat 10 0 0 0
My intuition tells me that the approach with the for loop is ok but I am not sure how exactly to address each matrix element in the loop and how to define the restrictions within the sum function to get the counts (goal matrix1) and sums (goal matrix2).
Your help will be much appreciated and I am very open to other solutions considering the above-mentioned restrictions (still knowing how to loop through a matrix and assign values to each position would be cool).
CodePudding user response:
To continue your for loop approach - The code you posted only loops through the first 4 elements of the dataframe (e.g. the maximum of nrow() or ncol() of the matrix), and it does not match the names of the matrix columns and rows to the names1 and names2 vectors in the dataframe.
I'm not sure if the following is exactly what you want, but it does loop through all of the dataframe, relate dataframe and matrix names and count/take sums, so you can modify it as you need.
my_matrix_count <- my_matrix
for(i in 1:nrow(my_matrix)){
for(j in 1:ncol(my_matrix)){
my_matrix_count[i,j] <-
length(df$values[df$names1==row.names(my_matrix)[i]
& df$names2==colnames(my_matrix)[j] ]>0)
my_matrix[i,j] <-
sum(df$values[df$names1==row.names(my_matrix)[i]
& df$names2==colnames(my_matrix)[j] ])
}
}
It's good to know looping and indexing, in my opinion, but yes, it can be done in many better ways, such as the answer by jblood94. For another version, you can also try to write:
xtabs(values ~names1 names2, data=df)
Regards, Lars
CodePudding user response:
Using match
and data.table
grouping operations:
library(data.table)
n <- length(dims)
# initialize matrices
m_count <- m_sum <- matrix(0L, n, n, 0, list(dims, dims))
dt <- setDT(df)[
, .(
# the matrix index for the name1-name2 combination
idx = (match(names2, dims) - 1L)*n match(names1, dims),
count = .N, # value for the count matrix
sum = sum(values) # value for the sum matrix
),
c("names1", "names2") # group by
]
# update matrices
m_count[dt$idx] <- dt$count
m_sum[dt$idx] <- dt$sum
m_count
#> dog mouse bird cat
#> dog 1 1 0 0
#> mouse 0 1 0 2
#> bird 0 0 0 0
#> cat 1 3 0 1
m_sum
#> dog mouse bird cat
#> dog 5 41 0 0
#> mouse 0 77 0 112
#> bird 0 0 0 0
#> cat 41 43 0 78
Data:
df <- data.frame(
names1=c('mouse','dog','cat','cat','mouse','cat','cat','dog','cat','mouse'),
names2=c('cat','dog','dog','mouse','cat','cat','mouse','mouse','mouse','mouse'),
values=c(11,5,41,25,101,78,12,41,6,77)
)
dims <- unique(c('dog','mouse','bird','cat'))