Home > Enterprise >  how to transform double matrix into dataframe columns
how to transform double matrix into dataframe columns

Time:11-05

i'm trying to reshape some square double matrix into some columns dataframe

something like:

                genomeA    genomeB    genomeC
       genomeA    1.0        0.5       0.3
       genomeB    0.5        1.0       0.2
       genomeC    0.3        0.2       1.0

into

Genomes1   Genomes2    Value
genomeA    genomeA     1.0
genomeA    genomeB     0.5
genomeA    genomeC     0.3
 .....

i tried to cast it into a dataframe but it didn't do anything i tried with tibble

    df <- corr %>%
  as_tibble() %>%
  setNames(c('GenomesA', 'GenomesB', 'AAI'))

it created some strange matrix but not what i want

if someone as a clue for me! thanks

CodePudding user response:

You can use array2DF (introduced in R 4.3.0):

array2DF(m)

But this has the unintended consequence to output an incorrect second column; you can remedy this like so:

array2DF(m, responseName = "AAI", simplify = FALSE) |>
  transform(AAI = mapply(\(x, y) x[y], AAI, match(Var2, unique(Var2))))

#      Var1    Var2 AAI
# 1 genomeA genomeA 1.0
# 2 genomeB genomeA 0.5
# 3 genomeC genomeA 0.3
# 4 genomeA genomeB 0.5
# 5 genomeB genomeB 1.0
# 6 genomeC genomeB 0.2
# 7 genomeA genomeC 0.3
# 8 genomeB genomeC 0.2
# 9 genomeC genomeC 1.0

data

m <- read.table(h=T,text="         genomeA    genomeB    genomeC
       genomeA    1.0        0.5       0.3
       genomeB    0.5        1.0       0.2
       genomeC    0.3        0.2       1.0")

CodePudding user response:

You may try the following -

library(tidyverse)

corr %>%
  as.data.frame() %>%
  rownames_to_column(var = "Genomes1") %>%
  pivot_longer(cols = -Genomes1, names_to = "Genomes2")

#  Genomes1 Genomes2   value
#  <chr>    <chr>      <dbl>
#1 genomeA  genomeA  -1.00  
#2 genomeA  genomeB  -1.32  
#3 genomeA  genomeC   0.254 
#4 genomeB  genomeA   0.0600
#5 genomeB  genomeB  -0.0602
#6 genomeB  genomeC  -0.594 
#7 genomeC  genomeA  -1.65  
#8 genomeC  genomeB  -0.530 
#9 genomeC  genomeC  -0.390 

Sample data

corr <- structure(c(-1.00308806318708, 0.0600357248405599, -1.65288400581753, 
-1.31933414605029, -0.0601936388016965, -0.529547763994527, 0.253656662526024, 
-0.594415786654097, -0.390061373503094), dim = c(3L, 3L), dimnames = list(
    c("genomeA", "genomeB", "genomeC"), c("genomeA", "genomeB", 
    "genomeC")))
  • Related