Home > Blockchain >  Given a list of 2-columns pandas dataframes, how can I take the median of the second columns?
Given a list of 2-columns pandas dataframes, how can I take the median of the second columns?

Time:12-02

I have a list of pandas dataframes, each with 2-columns. The first column represents an ID, and the second represents the values. How would I combine these dataframes to where values with common IDs are replaced with its median?

E.g

df_1 = pd.DataFrame({'#id': [1,2,3,4], 'values': [1,3,4,3]})
df_2 = pd.DataFrame({'#id': [1,2,3,5], 'values': [2,5,7,6]})
df_3 = pd.DataFrame({'#id': [1,2,4,5], 'values': [5,6,7,8]})

I would like the resulting new dataframe to be:

answer = pd.DataFrame({'#id': [1,2,3,4,5], 'values': [2,5,5.5,5,7]})

CodePudding user response:

Merge all df's into one df. Then group by id and calculate the median of each group.

df = pd.concat([df_1,df_2,df_3])
df = df.groupby('#id').agg({'values':'median'})
'''
#id values
1   2.0
2   5.0
3   5.5
4   5.0
5   7.0

'''

Write to excel:

df.reset_index().to_excel('give_an_excel_name.xlsx',index=None)
#or
df.to_excel('give_an_excel_name.xlsx')
  • Related