Perform unique row operation after a groupby-CodePudding

I have been stuck to a problem where I have done all the groupby operation and got the resultant dataframe as shown below but the problem came in last operation of calculation of one additional column

Current dataframe:

code        industry               category     count     duration
2       Retail                      Mobile        4         7
3       Retail                      Tab           2         33
3       Health                      Mobile        5         103
2       Food                         TV           1         88

The question: Want an additional column operation which calculates the ratio of count of industry 'retail' for the specific code column entry

for example: code 2 has 2 industry entry retail and food so operation column should have value 4/(4 1) = 0.8 and similarly for code3 as well as shown below

O/P:

code        industry               category     count     duration  operation
2       Retail                      Mobile        4         7         0.8
3       Retail                      Tab           2         33        -
3       Health                      Mobile        5         103       2/7 = 0.285
2       Food                         TV           1         88        -

Help on here as well that if I do just groupby I will miss out the information of category and duration also what would be better way to represent the output df there can been multiple industry and operation is limited to just retail

CodePudding user response：

I can't think of a single operation. But the way via a dictionary should work. Oh, and in advance for the other answerers the code to create the example dataframe.

st_l = [[2,'Retail','Mobile', 4, 7],
       [3,'Retail', 'Tab', 2, 33],
       [3,'Health', 'Mobile', 5, 103],
       [2,'Food', 'TV', 1, 88]]
df = pd.DataFrame(st_l, columns= 
     ['code','industry','category','count','duration'])

And now my attempt:

sums = df[['code', 'count']].groupby('code').sum().to_dict()['count']
df['operation'] = df.apply(lambda x: x['count']/sums[x['code']], axis=1)

CodePudding user response：

You can create a new column with the total count of each code using groupby.transform(), and then use loc to find only the rows that have as their industry 'Retail' and perform your division:

df['total_per_code'] = df.groupby(['code'])['count'].transform('sum')
df.loc[df.industry.eq('Retail'), 'operation'] = df['count'].div(df.total_per_code)

df.drop('total_per_code',axis=1,inplace=True)

prints back:

  code industry category  count  duration  operation
0     2   Retail   Mobile      4         7   0.800000
1     3   Retail      Tab      2        33   0.285714
2     3   Health   Mobile      5       103        NaN
3     2     Food       TV      1        88        NaN