I have a list of pandas dataframes, each with 2-columns. The first column represents an ID, and the second represents the values. How would I combine these dataframes to where values with common IDs are replaced with its median?
E.g
df_1 = pd.DataFrame({'#id': [1,2,3,4], 'values': [1,3,4,3]})
df_2 = pd.DataFrame({'#id': [1,2,3,5], 'values': [2,5,7,6]})
df_3 = pd.DataFrame({'#id': [1,2,4,5], 'values': [5,6,7,8]})
I would like the resulting new dataframe to be:
answer = pd.DataFrame({'#id': [1,2,3,4,5], 'values': [2,5,5.5,5,7]})
CodePudding user response:
Merge all df's into one df. Then group by id and calculate the median of each group.
df = pd.concat([df_1,df_2,df_3])
df = df.groupby('#id').agg({'values':'median'})
'''
#id values
1 2.0
2 5.0
3 5.5
4 5.0
5 7.0
'''
Write to excel:
df.reset_index().to_excel('give_an_excel_name.xlsx',index=None)
#or
df.to_excel('give_an_excel_name.xlsx')