Home > front end >  Pandas groupby and compute ratio of values with NA in multiple columns
Pandas groupby and compute ratio of values with NA in multiple columns

Time:05-25

I have a dataframe like as below

id,status,amount,qty
1,pass,123,4500
1,pass,156,3210
1,fail,687,2137
1,fail,456,1236
2,pass,216,324
2,pass,678,241
2,nan,637,213
2,pass,213,543

df = pd.read_clipboard(sep=',')

I would like to do the below

a) Groupby id and compute the pass percentage for each id

b) Groupby id and compute the average amount for each id

So, I tried the below

df['amt_avg'] = df.groupby('id')['amount'].mean()
df['pass_pct'] = df.groupby('status').apply(lambda x: x['status']/ x['status'].count())
df['fail_pct'] = df.groupby('status').apply(lambda x: x['status']/ x['status'].count())

but this doesn't work.

I am having trouble in getting the pass percentage.

In my real data I have lot of columns like status for which I have to find these % distribution of a specific value (ex: pass)

I expect my output to be like as below

id,pass_pct,fail_pct,amt_avg
1,50,50,2770.75
2,75,0,330.25

CodePudding user response:

Use crosstab with replace missing values by nan with remove nan column and then add new column amt_avg by DataFrame.join:

s = df.groupby('id')['qty'].mean()

df  = (pd.crosstab(df['id'], df['status'].fillna('nan'), normalize=0)
          .drop('nan', 1)
          .mul(100)
          .join(s.rename('amt_avg')))

print (df)
    fail  pass  amt_avg
id                     
1   50.0  50.0  2770.75
2    0.0  75.0   330.25
  • Related