Append matrices generated in for loop that have different col names to an external list of empty dat-CodePudding

I have a large dataset that I am trying to perform various analyses on, but first need to transform into matrices grouped by different variables.

For example, here is a toy dataset:

myData = pd.DataFrame({'dataset': ['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'bird', 'bird', 'bird', 'bird'], 
                  'category_1': ['orange', 'orange', 'white', 'white', 'black', 'brown', 'brown', 'black', 'red', 'green', 'red', 'green'], 
                  'category_2': ['this_cat', 'that_cat', 'this_cat', 'that_cat', 'this_dog', 'that_dog', 'this_dog', 'that_dog', 'this_bird', 'that_bird', 'this_bird', 'that_bird'],
                  'values': ['1', '8', '9', '2', '5', '4', '3', '10', '0', '2', '7', '9']
                 })

for i, animals in myData.groupby('dataset'):
    tuples = animals.groupby(['category_1', 'category_2'])['values'].mean().reset_index()
    tuples = pd.DataFrame(tuples)
    matrix = tuples.pivot(index='category_2', columns='category_1', values='values').reset_index()
    display(matrix)

Here I am grouping my data by "animals" and converting each group into a matrix. However, because the column names are not same across my matrices, I am having trouble saving my output into an external empty list or dataframe.

For example, I'd like to save each matrix into a separate dataframe that is dynamically generated depending on the number of groups in my data:

output_dfs = {k: pd.DataFrame([]) for k in myData['dataset']}

Desired output in this case would be 3 separate dataframes that I can access by a name: (the values are based on the toy dataset)

dataset category_1 category_2 green red
bird    0          that_bird  14.5  NaN
bird    1          this_bird  NaN   3.5

dataset category_1 category_2 orange white
cat     0          that_cat   8.0    2.0
cat     1          this_cat   1.0    9.0

dataset category_1 category_2 black brown
dog     0          that_dog   10.0  4.0
dog     1          this_dog   5.0   3.0

CodePudding user response：

I'm not sure what you mean, is this the result you want to achieve?

myData = pd.DataFrame({'dataset': ['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'bird', 'bird', 'bird', 'bird'], 
                  'category_1': ['orange', 'orange', 'white', 'white', 'black', 'brown', 'brown', 'black', 'red', 'green', 'red', 'green'], 
                  'category_2': ['this_cat', 'that_cat', 'this_cat', 'that_cat', 'this_dog', 'that_dog', 'this_dog', 'that_dog', 'this_bird', 'that_bird', 'this_bird', 'that_bird'],
                  'values': ['1', '8', '9', '2', '5', '4', '3', '10', '0', '2', '7', '9']
                 })

result = {}
for i, animals in myData.groupby('dataset'):
    tuples = animals.groupby(['category_1', 'category_2'])['values'].mean().reset_index()
    tuples = pd.DataFrame(tuples)
    matrix = tuples.pivot(index='category_2', columns='category_1', values='values').reset_index()
    result[i] = matrix
display(result)