I am trying to follow the example here: cui2vecWorkflow by creating a matrix similar to the one here term_cooccurrence_matrix.rda that has the following properties:
> cooc<-get(load('~/development/cui2vec/vignettes/term_cooccurrence_matrix.rda'))
> str(cooc)
Formal class 'dsCMatrix' [package "Matrix"] with 7 slots
..@ i : int [1:2366] 0 1 2 0 1 2 3 4 3 5 ...
..@ p : int [1:101] 0 1 2 3 7 8 10 17 19 27 ...
..@ Dim : int [1:2] 100 100
..@ Dimnames:List of 2
.. ..$ : chr [1:100] "C0016875" "C0162770" "C0024730" "C0038689" ...
.. ..$ : chr [1:100] "C0016875" "C0162770" "C0024730" "C0038689" ...
..@ x : num [1:2366] 412 6286 8280 118 110 ...
..@ uplo : chr "U"
..@ factors : list()
The dataframe I have looks like:
> test
CUI1 CUI2 Count
1 C0000699 C3894683 2
2 C0000699 C0101725 1
3 C0000699 C1882413 3
4 C0000699 C0245531 3
5 C0000699 C0068475 2
6 C0000699 C0538927 3
7 C0000699 C0724693 1
8 C0000699 C0216784 2
9 C0000699 C2248020 1
10 C0000699 C0069449 3
...
but when I read it in and convert to a matrix it obviously won't be the same structure, as per
> mat <- as.matrix(test)
> str(mat)
chr [1:1000000, 1:3] "C0000699" "C0000699" "C0000699" "C0000699" "C0000699" "C0000699" "C0000699" "C0000699" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "CUI1" "CUI2" "Count"
I then take the next step and coerce the matrix mat
to a sparse matrix:
> mat <- as(mat, "sparseMatrix")
> str(mat)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:3000000] 0 1 2 3 4 5 6 7 8 9 ...
..@ p : int [1:4] 0 1000000 2000000 3000000
..@ Dim : int [1:2] 1000000 3
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "CUI1" "CUI2" "Count"
..@ x : num [1:3000000] NA NA NA NA NA NA NA NA NA NA ...
..@ factors : list()
but the structure looks wrong.
Trying this, I get an error:
> mat <- as(mat, "dsCMatrix")
Error in asMethod(object) :
not a symmetric matrix; consider forceSymmetric() or symmpart()
In addition: Warning message:
In storage.mode(from) <- "double" : NAs introduced by coercion
So I try this:
> mat <- as(forceSymmetric(mat), "dsCMatrix")
Error in forceSymmetric(mat) :
invalid class 'NA' to dup_mMatrix_as_geMatrix
(I haven't been able to find any examples for how to construct a matrix of the class structure("dsCMatrix", package = "Matrix")
from a data.frame, so I am just winging it).
It looks like the Dim
and Dimnames
aren't defined properly, along with the value of x
.
CodePudding user response:
Following user20650's comment, first coerce the CUI*
columns to factor with the same levels, then use xtabs
to create a sparse matrix, then add its transpose.
txt <- '
CUI1 CUI2 Count
1 C0000699 C3894683 2
2 C0000699 C0101725 1
3 C0000699 C1882413 3
4 C0000699 C0245531 3
5 C0000699 C0068475 2
6 C0000699 C0538927 3
7 C0000699 C0724693 1
8 C0000699 C0216784 2
9 C0000699 C2248020 1
10 C0000699 C0069449 3
'
test <- read.table(textConnection(txt), header = TRUE)
library(Matrix)
levls <- Reduce(union, test[1:2])
test[1:2] <- lapply(test[1:2], factor, levels = levls)
res <- xtabs(Count ~ CUI1 CUI2, data = test, sparse = TRUE)
res <- forceSymmetric(res)
class(res)
#> [1] "dsCMatrix"
#> attr(,"package")
#> [1] "Matrix"
Created on 2022-02-13 by the reprex package (v2.0.1)