Home > Net >  pandas after groupby and apply function to one of the group how to combine the result back
pandas after groupby and apply function to one of the group how to combine the result back

Time:09-24

I have a df,you can get it by copy this code:

df = """
  ValOption  RB
0       SLA  4
1       AC   5
2       SLA  5
3       PG   5
4       SLA  5
5       PC   4
6       SLA  4

"""
df = pd.read_csv(StringIO(df.strip()), sep='\s ')

First I group the df:

grep=df.groupby('ValOption')

then I will have 3 groups:

SLA AC PG

Now I want to apply 3 different function to these 3 group.

def sla_func(df):
  return df['RB']*3 df['RB']

def ac_func(df):
  return df['RB']/4*df['RB'] 1

def pg_func(df):
  return df['RB']-5

And then the result will the value of a new column named group_v.

The output should looks like:

ValOption   RB  group_y
0   SLA    4    16
1   AC     5    7.25
2   SLA    5    20
3   PG     5    0
4   SLA    5    20
5   AC     4    5
6   SLA    4    16

Since in the real business logic there are tens of thousands rows ,so I think if I use group it maybe faster.

I tried use:

grp=df.groupby(['ValOption'])
sla=grp.get_group(('SLA').apply(sla(df))
ac=grp.get_group(('AC').apply(ac(df))
pg=grp.get_group(('PG').apply(pg(df))

But now work...any friend can help?

Notice in my real business, the function is extremely complicated ,I need a common way to solve that, the reason why I want to try groupby is because speed is very important to me ,thank you so much!

CodePudding user response:

No need for groupby, just map ValOption to different factors and then multiply it with RB:

df['group_y'] = df.ValOption.map({'SLA': 3, 'AC': 4, 'PG': 5}) * df.RB

df    
  ValOption  RB  group_y
0       SLA   4       12
1        AC   5       20
2       SLA   5       15
3        PG   5       25
4       SLA   5       15
5        AC   4       16
6       SLA   4       12

If the function is complex, you can decide which function to invoke based on g.name in groupby.apply:

group_funcs = {'SLA': sla, 'AC': ac, 'PG': pg}
df['group_y'] = df.groupby('ValOption', group_keys=False).apply(lambda g: group_funcs[g.name](g))

df
  ValOption  RB  group_y
0       SLA   4       12
1        AC   5       20
2       SLA   5       15
3        PG   5       25
4       SLA   5       15
5        AC   4       16
6       SLA   4       12

CodePudding user response:

you can do with .apply and then customize the function you want to apply:

import pandas as pd
import numpy as np
from io import StringIO

def col_group_y(row):
    if row['ValOption']=='SLA':
        return row['RB']*3
    elif row['ValOption']=='AC':
        return row['RB']*4
    elif row['ValOption']=='PG':
        return row['RB']*5
    else:
        return row['RB']



df = """
  ValOption  RB
0       SLA  4
1       AC   5
2       SLA  5
3       PG   5
4       SLA  5
5       PC   4
6       SLA  4

"""
df = pd.read_csv(StringIO(df.strip()), sep='\s ')
df['group_y']=df.apply(col_group_y,axis=1)
print(df)

result

  ValOption  RB  group_y
0       SLA   4       12
1        AC   5       20
2       SLA   5       15
3        PG   5       25
4       SLA   5       15
5        PC   4        4
6       SLA   4       12

CodePudding user response:

Maybe this solution:

df = """
  ValOption  RB
0       SLA  4
1       AC   5
2       SLA  5
3       PG   5
4       SLA  5
5       PC   4
6       SLA  4

"""
def sla_func(df):
  return df['RB']*3 df['RB']

def ac_func(df):
  return df['RB']/4*df['RB'] 1

def pg_func(df):
  return df['RB']-5



df = pd.read_csv(StringIO(df.strip()), sep='\s ')


col         = 'ValOption'
conditions  = [ df[col] =='SLA', df[col] =='AC', df[col] =='PG' ]
choices     = [ sla_func(df), ac_func(df), pg_func(df) ]
    
df["group_y"] = np.select(conditions, choices, default=np.nan)

print(df)
  • Related