Converting a list of triplets (row, column, value) to matrix as pandas df-CodePudding

My question is similar to this one, but still different. I have a list of triplets like the following, representing rows and columns of a matrix with their cell value:

a  = [("g1","g2",7),("g1","g3",5)]

The matrix is symmetrical, so the elements can be provided in any order - meaning that ("g1","g2",7) would imply ("g2","g1",7).

I would like to obtain a pandas df from this list, representing a matrix that has element names on the rows and columns, with missing values if a triplet is not listed in a:

    g1    g2    g3
g1  NaN   7     5
g2  7     NaN   Nan
g3  5     NaN   Nan

can you help me achieve this task in the most efficient way for huge lists?

CodePudding user response：

Maybe not the most elegant of all solution, but it does the job. For your list, identify all the "tags" you have (e.g., g1,g2,....gn).

import numpy as np
import pandas as pd
a  = (("g1","g2",7),("g1","g3",5))
tags = []
for t1, t2, _ in a:
    tags  = [t1, t2]
tags = index = columns = sorted(list(set(tags)))

Once you,ve done that, put them in a list

tags = dict((t, i) for i, t in enumerate(tags))

Create an empty table of the dimension of that list, replace all the 0 values with nan and then replace the nan values corresponding to the right tag and make that into a dataframe

Table = np.empty((len(tags),len(tags)))
Table[:] = np.nan
for t1, t2, corr in a:
    Table[tags[t1]][tags[t2]] = corr
    Table[tags[t2]][tags[t1]] = corr

df = pd.DataFrame(Table, index=index, columns=columns)

which returns

     g1   g2   g3
g1  NaN  7.0  5.0
g2  7.0  NaN  NaN
g3  5.0  NaN  NaN

CodePudding user response：

You're going to need to define an order for your matrix labels as well

import pandas as pd

a  = [("g1","g2",7),("g1","g3",5)]
order = ['g1', 'g2', 'g3']

Pivot your data

a_df = pd.DataFrame(a).pivot(index=0, columns=1, values=2)
a_df.index.name = None
a_df.columns.name = None

Reindex it so that it's got the correct order on both axes

a_df = a_df.reindex(order, axis=1).reindex(order, axis=0)

Add the array to it's transpose, only where the array is NaN (in case you have any values on the diagonal)

a_df = a_df.add(a_df.T[pd.isna(a_df)], fill_value=0)

>>> a_df
1   g1  g2  g3
0           
g1  NaN 7.0 5.0
g2  7.0 NaN NaN
g3  5.0 NaN NaN