I have a DataFrame called medal. In medal, there is a column called 'event_gender', which has 4 unique values (men, women, open, and mixed). I tried to write a function to get groupby by these unique values.
I want to write for loop for these naming process if possible.
Here is what I could do so far and it is working.
def gender(df, sex):
temp_df = df.loc[df['event_gender']==sex]
last_df = temp_df.groupby(['country_3_letter_code', 'medal_type'])['event_gender'].count()
return last_df
Men = gender(medal, 'Men')
Women = gender(medal, 'Women')
Mixed = gender(medal, 'Mixed')
Open = gender(medal, 'Open')
But in the last code here, I am naming every DataFrame separately, is there easier way to name these for Dataframes? For example:
for item in medal['event_gender'].unique():
item = gender(medal, item)
CodePudding user response:
Yes you could just do:
for item in medal['event_gender'].unique():
globals()[item] = gender(medal, item)
But why do this? Maintain your dataframe as it is and work on it with groupings. It is easier that way to do same computations on different groups of the same dataframe rather than doing same computation on different dataframes
CodePudding user response:
If you want to work on each unique items in event_gender column. Then you could use aggregation function on your grouped items like so:
gender_grouping = medal.groupby(['event_gender']).agg({'country_3_letter_code': 'value_counts',
'medal_type': 'value_counts'})
After this you can retrieve your interested event_gender items simply by:
gender_grouping.loc['Men']