I have a df, you can have it by run this code:
import numpy as np
import pandas as pd
from io import StringIO
dfs = """
M0 M1 M2 M3 M4 M5 age
1 1 2 3 4 5 6 3.2
2 7 5 4 5 8 3 4.5
3 4 8 9 3 5 2 6.7
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s ', )
And based on business logic I have the following function, the output is also what I expected:
def func(M0,M1,M2,M3,M4,M5,age):
newcol=np.prod([M0,M1,M2,M3,M4,M5][0:age])
return newcol
vfunc = np.frompyfunc(func, 7, 1)
df['newcol']=vfunc(df['M0'].values,df['M1'].values,df['M2'].values,df['M3'].values,df['M4'].values,df['M5'].values,df['age'].values.astype(int))
df
Output is:
M0 M1 M2 M3 M4 M5 age newcol
1 1 2 3 4 5 6 3.2 6
2 7 5 4 5 8 3 4.5 700
3 4 8 9 3 5 2 6.7 8640
The problem is there are too many parameters in here def func(M0,M1,M2,M3,M4,M5,age),is there anyway I can make these parameters a list or something else to may the function more clean?
I tried:
def func(df):
newcol=np.prod
([df['M0'].values,df['M1'].values,df['M2'].values,df['M3'].values,df['M4'].values,df['M5'].values][0:df['age'].values.astype(int)])
return newcol
vfunc = np.frompyfunc(func,1, 1)
df['newcol']=vfunc(df)
Error:
TypeError: 'float' object is not subscriptable
Notice the reason why I don't use pd.apply(),is because in my real business the data is very large and pd.apply()runs very slowly.
CodePudding user response:
This is not optimized, but at least it can be more readable in terms of selecting the M columns, although it has an extra function:
M=["M0","M1","M2","M3","M4","M5"]
def func2(df, M):
return [df[i].values for i in M]
def func(age,*Ms):
newcol=np.prod(Ms[0:age])
return newcol
vfunc = np.frompyfunc(func, len(M) 1, 1)
df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))
df