Home > Software design >  Convert dataframe to a matrix based on events' frequency
Convert dataframe to a matrix based on events' frequency

Time:12-14

I am trying to create a matrix from a dataframe based on the frequency of interaction of pairs of individuals. In the dataframe (example below), I have a list of names under the GIVER and RECIPIENT columns. Each row with a combination of GIVER and RECIPIENT corresponds to one (directed) interaction between the two individuals (interaction dyad).

The matrix I would like to obtain should have all the names of the individuals listed in the columns "GIVER" and "RECIPIENT" (not all individuals appear in both columns). The matrix's rows should represent the number of interactions that an individual (each rowname) gives to any other individual (each colname). The columns should instead represent the the number of interactions that each individual (each colname) receives from any other individual (each rowname).

This is an example of my dataframe:

GIVER RECIPIENT
A B
A C
D A
E B
C E
B D

I used this function to obtain the matrix:

my_matrix = function(df){
  tablei = as.data.frame(table(union(df$GIVER, df$RECIPIENT), union(df$GIVER, df$RECIPIENT)))
  nameVals <- sort(unique(unlist(tablei[1:2])))
  matrixi <- matrix(0, length(nameVals), length(nameVals), dimnames = list(nameVals,nameVals))
  matrixi[as.matrix(tablei[c("Var1", "Var2")])] <- tablei[["Freq"]]
  as.data.frame(matrixi)}

However, there is a problem in the second row, which returns the frequency values as all 0 (any interaction of an individual with others) or 1 (interaction of an individual with itself).

tablei = as.data.frame(table(union(df$GIVER, df$RECIPIENT), union(df$GIVER, df$RECIPIENT)))

Do you have any idea on how to fix the problem?

Thank you for your help!

CodePudding user response:

A tidyverse approach to your problem.

Data

df <-
tibble::tribble(
  ~GIVER, ~RECIPIENT,
     "A",        "B",
     "A",        "C",
     "D",        "A",
     "E",        "B",
     "C",        "E",
     "B",        "D"
  )

Code

library(dplyr)
library(tidyr)

df %>% 
  mutate(freq = 1) %>% 
  tidyr::complete(GIVER,RECIPIENT,fill = list(freq = 0))

Output

# A tibble: 25 x 3
   GIVER RECIPIENT  freq
   <chr> <chr>     <dbl>
 1 A     A             0
 2 A     B             1
 3 A     C             1
 4 A     D             0
 5 A     E             0
 6 B     A             0
 7 B     B             0
 8 B     C             0
 9 B     D             1
10 B     E             0
# ... with 15 more rows
  • Related