This question is based on my previous question.
I've got a Pandas dataframe like the one below. What I'm trying to do is calculating the mean of column r1 till r50, for every time that '5' occurs in the respective s-column (r1-s2, r2-s2,... r50-s50).
s1 ... s50 r1 ... r50
5 5 0.5 1
1 5 0.43 0.5
5 1 1 0.43
5 5 1 1
In this case, in s1: 5 occures three times, so we take the average over 0.5 1 1=0.83, in s50: 5 occures three times, so we take the average over 1 0.5 1=0.83. I want to get the result in a new data frame. Can someone help me to calculate this? Thanks!
CodePudding user response:
You can filter
for columns starting with s
, and for each column, select indexes where the item is 5
, and select those rows from the column of the same name except with s
replaced by r
, and compute the mean:
s = df.filter(like='s').apply(lambda col: df.loc[col == 5, col.name.replace('s', 'r')].mean())
Output:
>>> s
s1 0.833333
s50 0.833333
dtype: float64
>>> s['s1']
0.8333333333333334