Home > Enterprise >  Dataframe with metrics per row and stimuli average per column. How to do it?
Dataframe with metrics per row and stimuli average per column. How to do it?

Time:03-17

I have a very trivial problem but I don't know well the Pandas library and I don't know how to operate.

I have a dataframe where there is the Id of the subjects, and some metrics (A, B, C) repeated for the number of stimuli (Stim_1, Stim_2, etc..)

I need to create from this dataframe another dataframe composed of the averages of each stimulus for the individual metrics. Till now i wrote

 df_mean = pd.DataFrame()
 for col in df:
 m = df[col].mean()

Here the original dataframe

Here the original dataframe (df)

and

here the dataframe I want to create

here the dataframe I want to create

CodePudding user response:

You can create MultiIndex by all columns without ID by split, so possible reshape by DataFrame.stack with aggregate mean by second level:

print (df)
   ID  A_Stim1  A_Stim2  A_Stim3  B_Stim1  B_Stim2  B_Stim3
0   1        1        5        6        9        8        7
1   8        1        6        4        5        4        5

df1 = df.set_index('ID')
df1.columns = df1.columns.str.split('_', expand=True, n=1)
df2 = df1.stack(0).groupby(level=1).mean()
print (df2)
   Stim1  Stim2  Stim3
A    1.0    5.5    5.0
B    7.0    6.0    6.0

CodePudding user response:

You can use pandas.wide_to_long:

cols = ['A', 'B', 'C']
# or generic
cols = df.columns.str.extract('([^_] )_', expand=False).unique()

(pd
 .wide_to_long(df.reset_index(), cols,
               i='index', j='id2', sep='_',
               suffix='Stim\d ') # can also be generic '. '
 .groupby(level='id2').mean()
 .T
)

output (input of same shape as provided, filled with 1s):

id2  Stim1  Stim2  Stim3
A      1.0    1.0    1.0
B      1.0    1.0    1.0
C      1.0    1.0    1.0
  • Related