I would like to create a matrix in python using a dataframe that has all the frequencies for each state transition combination. For example if we have 3 states (a, b, c) and two time periods (1 & 2) I could have the following dataframe with the following columns:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['a to a', 20],['a to b', 10], ['a to c', 5],
['b to a', 7],['b to b', 30],['b to c', 15],
['c to a', 9],['c to b', 6],['c to c', 25]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Transition', 'Frequency'])
from this dataframe is there a way to create a 3x3 matrix (array) in python and populate it with the values from the "Frequency" column in my dataframe?
ex matrix: a b c
a[20, 10, 5]
b[ 7, 30, 15]
c[ 9, 6, 25]
bonus points for adding the axis values of "a, b, c" to the matrix output ;)
CodePudding user response:
You can split
the 'x to y'
strings into 'x'
and 'y'
columns, pivot
, then convert to_numpy
:
a = (df
.join(df['Transition'].str.split(r'\s*to\s*', expand=True))
.pivot(0, 1, 'Frequency')
.to_numpy()
)
Output:
array([[20, 10, 5],
[ 7, 30, 15],
[ 9, 6, 25]])
Intermediate pivoted DataFrame (before conversion to numpy):
1 a b c
0
a 20 10 5
b 7 30 15
c 9 6 25
CodePudding user response:
Try the following code
import pandas as pd
import numpy as np
# initialize list of lists
data = [['a to a', 20],['a to b', 10], ['a to c', 5],
['b to a', 7],['b to b', 30],['b to c', 15],
['c to a', 9],['c to b', 6],['c to c', 25]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Transition', 'Frequency'])
# empty data frame
df1 = pd.DataFrame()
l = []
for i,j in df[['Transition','Frequency']].values:
l.append(j)
df1.loc[i[0],i[-1]] = j
arr = np.array(l).reshape(3,3) # for nump array
print(df1) # shows with col and row values
Output of pandas dataframe:
a b c
a 20.0 10.0 5.0
b 7.0 30.0 15.0
c 9.0 6.0 25.0