Looking for way to populate two dimensional array with values from a dataframe column in python-CodePudding

I would like to create a matrix in python using a dataframe that has all the frequencies for each state transition combination. For example if we have 3 states (a, b, c) and two time periods (1 & 2) I could have the following dataframe with the following columns:

# Import pandas library
import pandas as pd

# initialize list of lists
data = [['a to a', 20],['a to b', 10], ['a to c', 5],
        ['b to a', 7],['b to b', 30],['b to c', 15],
        ['c to a', 9],['c to b', 6],['c to c', 25]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Transition', 'Frequency'])

from this dataframe is there a way to create a 3x3 matrix (array) in python and populate it with the values from the "Frequency" column in my dataframe?

ex matrix:    a    b   c
            a[20, 10,  5]
            b[ 7, 30, 15]
            c[ 9,  6, 25]

bonus points for adding the axis values of "a, b, c" to the matrix output ;)

CodePudding user response：

You can split the 'x to y' strings into 'x' and 'y' columns, pivot, then convert to_numpy:

a = (df
 .join(df['Transition'].str.split(r'\s*to\s*', expand=True))
 .pivot(0, 1, 'Frequency')
 .to_numpy()
 )

Output:

array([[20, 10,  5],
       [ 7, 30, 15],
       [ 9,  6, 25]])

Intermediate pivoted DataFrame (before conversion to numpy):

1   a   b   c
0            
a  20  10   5
b   7  30  15
c   9   6  25

CodePudding user response：

Try the following code

import pandas as pd
import numpy as np

# initialize list of lists
data = [['a to a', 20],['a to b', 10], ['a to c', 5],
        ['b to a', 7],['b to b', 30],['b to c', 15],
        ['c to a', 9],['c to b', 6],['c to c', 25]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Transition', 'Frequency'])
# empty data frame
df1 = pd.DataFrame()
l = []
for i,j in df[['Transition','Frequency']].values:
    l.append(j)
    df1.loc[i[0],i[-1]] = j
arr = np.array(l).reshape(3,3)  # for nump array
print(df1)  # shows with col and row values

Output of pandas dataframe:

      a     b     c
a  20.0  10.0   5.0
b   7.0  30.0  15.0
c   9.0   6.0  25.0