Home > OS >  overwrite a dataframe inside a dictionary of dataframes
overwrite a dataframe inside a dictionary of dataframes

Time:03-21

Goal

I'm trying to create a main DataFrame. From this main DataFrame, utilize the "name" column to provide each value in "name" with a DataFrame that is a copy of the main DataFrame. As the "name" column grows, the number of DataFrames will grow. This is done by utilizing a dictionary of DataFrames, so that I can reference each persons DataFrame easily.

Afterwards, I apply filters to each person's respective DataFrame.

Issues I am facing

  1. The code gives a "name" error, in reference to the column called "name". This issue occurs in the last row of code.

Code

import pandas as pd
import copy

df = {'name': ['bob', 'alex', 'ryan', 'andrew', 'goliath'], 'right_arm_length': [1, 2, 3, 8, 7], 'left_arm_length': [3, 4, 2, 5, 8]}
df = pd.DataFrame(data=df) #Create main dataframe

dictionary_df={}  #Create empty dictionary to store dataframes in

 
for index, value in df['name'].items():
    dictionary_df[value   '_df'] = copy.deepcopy(df)        #This will create a dataframe for each person                        
    for idx in range(len(df)):          #Filtration process begins here   
        if idx != index:                #Prevent from comparing person to themselves
            dictionary_df[value   '_df'] =  dictionary_df[value   '_df'][dictionary_df[value   '_df']['name'] == value]['right_arm_length'] < dictionary_df[value   '_df']['left_arm_length'][idx]          ####**Issue is here**

Note: The last row is used to update dictionary_df[value '_df'] after applying a condition where the owner of the DataFrame's right_arm_length is smaller than all other person's left_arm_length. So for bob_df, it would compare bob's right_arm_length to the left_arm_length of all other people in bob_df.

CodePudding user response:

can you please check if this is what you want?

dictionary_df = {}
for index, value in df['name'].items():
  dictionary_df[value   '_df'] = copy.deepcopy(df)
  right_arm_length = dictionary_df[value   '_df'][dictionary_df[value   '_df']['name']==value]['right_arm_length'].iloc[0]
  dictionary_df[value   '_df'].loc[dictionary_df[value   '_df']['name']!=value] = dictionary_df[value   '_df'][dictionary_df[value   '_df']['left_arm_length'] > right_arm_length]
  dictionary_df[value   '_df'] = dictionary_df[value   '_df'].dropna()

CodePudding user response:

You could create a boolean mask and iteratively filter df:

out = {}
for name in df['name']:
    msk = df['name']==name
    out[f'{name}_df'] = df[(df.loc[msk, 'right_arm_length'].iat[0] < df['left_arm_length']) | msk]

Output:

{'bob_df':       name  right_arm_length  left_arm_length
 0      bob                 1                3
 1     alex                 2                4
 2     ryan                 3                2
 3   andrew                 8                5
 4  goliath                 7                8,
 'alex_df':       name  right_arm_length  left_arm_length
 0      bob                 1                3
 1     alex                 2                4
 3   andrew                 8                5
 4  goliath                 7                8,
 'ryan_df':       name  right_arm_length  left_arm_length
 1     alex                 2                4
 2     ryan                 3                2
 3   andrew                 8                5
 4  goliath                 7                8,
 'andrew_df':      name  right_arm_length  left_arm_length
 3  andrew                 8                5,
 'goliath_df':       name  right_arm_length  left_arm_length
 4  goliath                 7                8}

If you have Python >=3.8, you could use walrus operator and convert the above loop into a dict comprehension:

out = {f'{name}_df': df[(df.loc[(msk := df['name']==name), 'right_arm_length'].iat[0] < df['left_arm_length']) | msk] for name in df['name']}
  • Related