I have a df,you can get it by copy this code:
df = """
ValOption RB
0 SLA 4
1 AC 5
2 SLA 5
3 PG 5
4 SLA 5
5 PC 4
6 SLA 4
"""
df = pd.read_csv(StringIO(df.strip()), sep='\s ')
First I group the df:
grep=df.groupby('ValOption')
then I will have 3 groups:
SLA AC PG
Now I want to apply 3 different function to these 3 group.
def sla_func(df):
return df['RB']*3 df['RB']
def ac_func(df):
return df['RB']/4*df['RB'] 1
def pg_func(df):
return df['RB']-5
And then the result will the value of a new column named group_v.
The output should looks like:
ValOption RB group_y
0 SLA 4 16
1 AC 5 7.25
2 SLA 5 20
3 PG 5 0
4 SLA 5 20
5 AC 4 5
6 SLA 4 16
Since in the real business logic there are tens of thousands rows ,so I think if I use group it maybe faster.
I tried use:
grp=df.groupby(['ValOption'])
sla=grp.get_group(('SLA').apply(sla(df))
ac=grp.get_group(('AC').apply(ac(df))
pg=grp.get_group(('PG').apply(pg(df))
But now work...any friend can help?
Notice in my real business, the function is extremely complicated ,I need a common way to solve that, the reason why I want to try groupby is because speed is very important to me ,thank you so much!
CodePudding user response:
No need for groupby
, just map ValOption
to different factors and then multiply it with RB
:
df['group_y'] = df.ValOption.map({'SLA': 3, 'AC': 4, 'PG': 5}) * df.RB
df
ValOption RB group_y
0 SLA 4 12
1 AC 5 20
2 SLA 5 15
3 PG 5 25
4 SLA 5 15
5 AC 4 16
6 SLA 4 12
If the function is complex, you can decide which function to invoke based on g.name
in groupby.apply
:
group_funcs = {'SLA': sla, 'AC': ac, 'PG': pg}
df['group_y'] = df.groupby('ValOption', group_keys=False).apply(lambda g: group_funcs[g.name](g))
df
ValOption RB group_y
0 SLA 4 12
1 AC 5 20
2 SLA 5 15
3 PG 5 25
4 SLA 5 15
5 AC 4 16
6 SLA 4 12
CodePudding user response:
you can do with .apply and then customize the function you want to apply:
import pandas as pd
import numpy as np
from io import StringIO
def col_group_y(row):
if row['ValOption']=='SLA':
return row['RB']*3
elif row['ValOption']=='AC':
return row['RB']*4
elif row['ValOption']=='PG':
return row['RB']*5
else:
return row['RB']
df = """
ValOption RB
0 SLA 4
1 AC 5
2 SLA 5
3 PG 5
4 SLA 5
5 PC 4
6 SLA 4
"""
df = pd.read_csv(StringIO(df.strip()), sep='\s ')
df['group_y']=df.apply(col_group_y,axis=1)
print(df)
result
ValOption RB group_y
0 SLA 4 12
1 AC 5 20
2 SLA 5 15
3 PG 5 25
4 SLA 5 15
5 PC 4 4
6 SLA 4 12
CodePudding user response:
Maybe this solution:
df = """
ValOption RB
0 SLA 4
1 AC 5
2 SLA 5
3 PG 5
4 SLA 5
5 PC 4
6 SLA 4
"""
def sla_func(df):
return df['RB']*3 df['RB']
def ac_func(df):
return df['RB']/4*df['RB'] 1
def pg_func(df):
return df['RB']-5
df = pd.read_csv(StringIO(df.strip()), sep='\s ')
col = 'ValOption'
conditions = [ df[col] =='SLA', df[col] =='AC', df[col] =='PG' ]
choices = [ sla_func(df), ac_func(df), pg_func(df) ]
df["group_y"] = np.select(conditions, choices, default=np.nan)
print(df)