Home > Back-end >  Finding common elements from two columns in symmetric adjacency matrix
Finding common elements from two columns in symmetric adjacency matrix

Time:10-24

I have a sparse symmetric matrix which represents authors of some book. Elements Ai,j and Aj,i are both equal to one if the people associated with indices i and j are coauthors and equal to zero otherwise. I'm trying to find a way in matrix representation such that given two columns (authors), I find their common co-authors. Preferably in Matlab or Julia code representation.

CodePudding user response:

The binary & between the columns, applied elementwise, will return a vector with 1s only where both columns had 1s. You can do a findall over that to then return the indices where the result is 1, which indicates the common co-authors.

julia> A
5×5 SparseMatrixCSC{Bool, Int64} with 12 stored entries:
 ⋅  1  ⋅  1  1
 1  ⋅  ⋅  ⋅  1
 ⋅  ⋅  ⋅  ⋅  1
 1  ⋅  ⋅  ⋅  1
 1  1  1  1  ⋅

julia> common = A[:, 1] .& A[:, 5]
5-element SparseVector{Bool, Int64} with 2 stored entries:
  [2]  =  1
  [4]  =  1

julia> findall(common)
2-element Vector{Int64}:
 2
 4

This finds the common co-authors between authors 1 and 5, in Julia. The . before the & indicates that the operator should be applied elementwise. To generalize this, you can write it as a function like:

julia> function findcommoncoauths(adjmat, author1, author2)
         @views findall(adjmat[:, author1] .& adjmat[:, author2])
       end

(The @views is to avoid unnecessarily allocating new memory for the columns, which is a good practice for performant code.)

CodePudding user response:

In case the author collaboration matrix is big, it is nice to take advantage of sparseness. SparseArrays offers many optimized functions (some exported and some not). For example dot (defined in LinearAlgebra) on SparseVectors which calculates dot product. Since the sparse matrices are stored in columns (CSC format), getting columns as SparseVectors is fast. So to count the number of common co-authors between authors i and j:

@views dot(adjmat[:,i], adjmat[:,j])

works fast. Here is an example:

julia> using SparseArrays, LinearAlgebra

julia> M = sprand(Bool,100,100,0.1);  # random boolean matrix

julia> M = M .| M';            # symetric coauthorship

julia> M[diagind(M)] .= false; # no self-coauthorship

julia> M
100×100 SparseMatrixCSC{Bool, Int64} with 1841 stored entries:
⠊⡠⠗⠀⠕⡛⢲⣢⠼⠥⠀⢀⡂⠲⠦⠋⡐⠰⠆⠲⠊⠖⡷⠿⠅⠘⡫⠱⠳⠔
⠙⠁⣏⠙⡣⢎⠓⢮⢢⠣⡴⢀⡂⣊⢙⣈⣯⢮⡁⣂⣺⡏⡰⢁⠯⢜⠐⣈⠼⠜
⣵⠡⡩⢎⡁⡨⣦⣥⠙⡌⠯⡙⣏⡬⢛⠀⢡⢀⢔⢔⢲⡍⠚⣁⣉⢥⡁⣆⣤⠇
⠸⣲⡹⣄⠌⣿⢴⠓⣝⠒⣂⢕⠏⡱⡒⣠⠶⡰⠆⣈⠑⡖⣇⡐⠺⠠⠄⡖⠹⡦
⠖⡇⠬⡒⡓⠤⢳⠙⠋⠄⠜⢰⡖⠨⡀⡄⣒⡨⠒⡘⠤⠄⠆⠆⢮⢜⠥⠴⠔⠄
⠀⢀⠐⢋⣏⠣⢌⢜⢒⣁⣋⠘⡎⢍⠸⠢⢻⣸⡀⢋⡈⣂⣏⠋⢸⢰⣖⢰⡡⡆
⢨⡈⡨⢨⡋⡽⢏⡡⡘⡉⡎⢍⣊⠘⢯⣍⡸⡪⢅⣫⣾⡉⠶⠀⠿⠈⢥⠽⢅⡍
⡬⠃⡓⢰⠛⠐⠘⣨⠀⠬⠲⡂⡏⢷⣀⠘⠊⢲⠃⡰⠠⡆⢂⠔⣕⢂⡄⣺⠖⠮
⢐⡈⡫⣟⠁⢒⢘⡣⡘⡸⣛⣲⡲⡪⢪⣀⠎⠅⣒⣙⣙⡇⠳⢊⠕⢸⣢⣒⡫⣂
⢨⡁⠡⢨⢐⢕⡈⢡⣘⠠⡤⢈⡥⣱⢉⡠⣜⢸⢤⡷⠨⠍⣹⠀⠀⢰⢺⣫⠬⡥
⢪⠄⡾⠾⡜⠶⢱⠤⠀⠇⠢⢨⡞⠻⠠⠦⠷⠼⡆⠆⠴⠃⢾⠔⠖⠘⢇⠞⠲⡶
⣽⡏⠔⢊⠞⢠⢉⠹⠨⠅⡯⠙⠘⠃⢈⠔⡹⢂⠓⠚⢚⠗⡎⠉⠟⢊⠂⡑⣡⠁
⣁⠁⣋⢇⠇⣜⠚⡂⣊⢗⢒⣒⡛⠃⠱⢙⣑⣁⢀⣀⣘⠁⡻⢁⠺⠂⡿⢚⠟⡐
⢏⡊⡐⢠⠡⢬⢠⠥⢁⡇⢘⣙⣅⡗⣠⣩⢨⢺⡾⣲⣩⠕⢌⠠⣻⢋⠀⠀⢩⣣
⢙⠆⣒⠇⠤⠟⠳⡦⠐⠅⠡⠮⡅⠵⡸⡅⠫⢪⠆⡧⢸⡦⠅⠚⢛⠡⠧⣲⠀⠀

julia> @views dot(M[:,1], M[:,2])
5

The last line shows authors #1 and author #2 have 5 common coauthors.

The question wants a list of common coauthors, and unfortunately dot loses this information by aggregation.

Another answer to this post (by Sundar) provides a solution:

function findcommoncoauths(adjmat, author1, author2)
    @views findall(adjmat[:, author1] .& adjmat[:, author2])
end

giving the result for the above matrix:

julia> findcommoncoauths(M, 1, 2)
5-element Vector{Int64}:
 17
 74
 78
 80
 88

This method employs the .& operator which uses broadcasting and so doesn't take advantage of matrix sparsness. Additionally, although the question does not specify, the common co-authors might be needed just for processing and not necessarily stored, so it would be nice to iterate over them.

The following is an iterator to do so:

struct SparseColumnCommon{T,I}
    iistart::I
    jjstart::I
    iimax::I
    jjmax::I
    parent::SparseMatrixCSC{T,I}
end

function sparsecolumncommon(M::SparseMatrixCSC{T,I}, i::I, j::I) where {T,I}
    ii = first(nzrange(M,i))
    iimax = last(nzrange(M,i))
    jj = first(nzrange(M,j))
    jjmax = last(nzrange(M,j))
    SparseColumnCommon{T,I}(ii, jj, iimax, jjmax, M)
end

Base.eltype(::Type{SparseColumnCommon{T,I}}) where {T,I} = Tuple{I, T, T}
Base.IteratorSize(::Type{SparseColumnCommon{T,I}}) where {T,I} = Iterators.Base.SizeUnknown()
function Base.iterate(it::SparseColumnCommon, state=(it.iistart, it.jjstart))
    ii, jj = state
    while ii <= it.iimax && jj <= it.jjmax
        ri, rj = (it.parent.rowval)[ii], (it.parent.rowval)[jj]
        if ri == rj
            return ((ri, (it.parent.nzval)[ii], (it.parent.nzval)[jj]), (ii 1, jj 1))
        elseif ri < rj
            ii  = 1
        else rj < ri
            jj  = 1
        end
    end
    return nothing
end

With this defined (perhaps in a separate included file), we can use it to define:

findcommoncoauths2(adjmat, author1, author2) = 
  map(first, sparsecolumncommon(adjmat, author1, author2))

And like before:

julia> findcommoncoauths2(M,1,2)
5-element Vector{Int64}:
 17
 74
 78
 80
 88

The iterator can also be used to process common co-authors one by one without allocating an output vector which is perhaps useful.

Finally, perhaps this iterator can also be used for other sparse matrix column tasks. It returns on each iteration a triplet: (row, column1val, column2val).

This is already a long answer, but benchmarks show that using this iterator is faster for sparse and big enough matrices (enough is not too large).

  • Related