Home > database >  Append matrices generated in for loop that have different col names to an external list of empty dat
Append matrices generated in for loop that have different col names to an external list of empty dat

Time:12-01

I have a large dataset that I am trying to perform various analyses on, but first need to transform into matrices grouped by different variables.

For example, here is a toy dataset:

myData = pd.DataFrame({'dataset': ['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'bird', 'bird', 'bird', 'bird'], 
                  'category_1': ['orange', 'orange', 'white', 'white', 'black', 'brown', 'brown', 'black', 'red', 'green', 'red', 'green'], 
                  'category_2': ['this_cat', 'that_cat', 'this_cat', 'that_cat', 'this_dog', 'that_dog', 'this_dog', 'that_dog', 'this_bird', 'that_bird', 'this_bird', 'that_bird'],
                  'values': ['1', '8', '9', '2', '5', '4', '3', '10', '0', '2', '7', '9']
                 })

for i, animals in myData.groupby('dataset'):
    tuples = animals.groupby(['category_1', 'category_2'])['values'].mean().reset_index()
    tuples = pd.DataFrame(tuples)
    matrix = tuples.pivot(index='category_2', columns='category_1', values='values').reset_index()
    display(matrix)

Here I am grouping my data by "animals" and converting each group into a matrix. However, because the column names are not same across my matrices, I am having trouble saving my output into an external empty list or dataframe.

For example, I'd like to save each matrix into a separate dataframe that is dynamically generated depending on the number of groups in my data:

output_dfs = {k: pd.DataFrame([]) for k in myData['dataset']}

Desired output in this case would be 3 separate dataframes that I can access by a name: (the values are based on the toy dataset)

dataset category_1 category_2 green red
bird    0          that_bird  14.5  NaN
bird    1          this_bird  NaN   3.5

dataset category_1 category_2 orange white
cat     0          that_cat   8.0    2.0
cat     1          this_cat   1.0    9.0

dataset category_1 category_2 black brown
dog     0          that_dog   10.0  4.0
dog     1          this_dog   5.0   3.0

CodePudding user response:

I'm not sure what you mean, is this the result you want to achieve?

myData = pd.DataFrame({'dataset': ['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'bird', 'bird', 'bird', 'bird'], 
                  'category_1': ['orange', 'orange', 'white', 'white', 'black', 'brown', 'brown', 'black', 'red', 'green', 'red', 'green'], 
                  'category_2': ['this_cat', 'that_cat', 'this_cat', 'that_cat', 'this_dog', 'that_dog', 'this_dog', 'that_dog', 'this_bird', 'that_bird', 'this_bird', 'that_bird'],
                  'values': ['1', '8', '9', '2', '5', '4', '3', '10', '0', '2', '7', '9']
                 })

result = {}
for i, animals in myData.groupby('dataset'):
    tuples = animals.groupby(['category_1', 'category_2'])['values'].mean().reset_index()
    tuples = pd.DataFrame(tuples)
    matrix = tuples.pivot(index='category_2', columns='category_1', values='values').reset_index()
    result[i] = matrix
display(result)
  • Related