Goal
I'm trying to create a main DataFrame. From this main DataFrame, utilize the "name" column to provide each value in "name" with a DataFrame that is a copy of the main DataFrame. As the "name" column grows, the number of DataFrames will grow. This is done by utilizing a dictionary of DataFrames, so that I can reference each persons DataFrame easily.
Afterwards, I apply filters to each person's respective DataFrame.
Issues I am facing
- The code gives a "name" error, in reference to the column called "name". This issue occurs in the last row of code.
Code
import pandas as pd
import copy
df = {'name': ['bob', 'alex', 'ryan', 'andrew', 'goliath'], 'right_arm_length': [1, 2, 3, 8, 7], 'left_arm_length': [3, 4, 2, 5, 8]}
df = pd.DataFrame(data=df) #Create main dataframe
dictionary_df={} #Create empty dictionary to store dataframes in
for index, value in df['name'].items():
dictionary_df[value '_df'] = copy.deepcopy(df) #This will create a dataframe for each person
for idx in range(len(df)): #Filtration process begins here
if idx != index: #Prevent from comparing person to themselves
dictionary_df[value '_df'] = dictionary_df[value '_df'][dictionary_df[value '_df']['name'] == value]['right_arm_length'] < dictionary_df[value '_df']['left_arm_length'][idx] ####**Issue is here**
Note: The last row is used to update dictionary_df[value '_df']
after applying a condition where the owner of the DataFrame's right_arm_length
is smaller than all other person's left_arm_length
. So for bob_df
, it would compare bob
's right_arm_length
to the left_arm_length
of all other people in bob_df
.
CodePudding user response:
can you please check if this is what you want?
dictionary_df = {}
for index, value in df['name'].items():
dictionary_df[value '_df'] = copy.deepcopy(df)
right_arm_length = dictionary_df[value '_df'][dictionary_df[value '_df']['name']==value]['right_arm_length'].iloc[0]
dictionary_df[value '_df'].loc[dictionary_df[value '_df']['name']!=value] = dictionary_df[value '_df'][dictionary_df[value '_df']['left_arm_length'] > right_arm_length]
dictionary_df[value '_df'] = dictionary_df[value '_df'].dropna()
CodePudding user response:
You could create a boolean mask and iteratively filter df
:
out = {}
for name in df['name']:
msk = df['name']==name
out[f'{name}_df'] = df[(df.loc[msk, 'right_arm_length'].iat[0] < df['left_arm_length']) | msk]
Output:
{'bob_df': name right_arm_length left_arm_length
0 bob 1 3
1 alex 2 4
2 ryan 3 2
3 andrew 8 5
4 goliath 7 8,
'alex_df': name right_arm_length left_arm_length
0 bob 1 3
1 alex 2 4
3 andrew 8 5
4 goliath 7 8,
'ryan_df': name right_arm_length left_arm_length
1 alex 2 4
2 ryan 3 2
3 andrew 8 5
4 goliath 7 8,
'andrew_df': name right_arm_length left_arm_length
3 andrew 8 5,
'goliath_df': name right_arm_length left_arm_length
4 goliath 7 8}
If you have Python >=3.8, you could use walrus operator and convert the above loop into a dict comprehension:
out = {f'{name}_df': df[(df.loc[(msk := df['name']==name), 'right_arm_length'].iat[0] < df['left_arm_length']) | msk] for name in df['name']}