pandas numpy how simplify multiple vectorized function parameters-CodePudding

I have a df, you can have it by run this code:

import numpy as np
import pandas as pd
from io import StringIO
dfs = """
    M0     M1   M2  M3 M4   M5 age
1   1      2    3    4  5    6  3.2        
2   7      5    4    5  8    3  4.5
3   4      8    9    3  5    2  6.7
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s ', )

And based on business logic I have the following function, the output is also what I expected:

def func(M0,M1,M2,M3,M4,M5,age):
    newcol=np.prod([M0,M1,M2,M3,M4,M5][0:age])
    return newcol

vfunc = np.frompyfunc(func, 7, 1)
df['newcol']=vfunc(df['M0'].values,df['M1'].values,df['M2'].values,df['M3'].values,df['M4'].values,df['M5'].values,df['age'].values.astype(int))

df

Output is:

M0  M1  M2  M3  M4  M5  age newcol
1   1   2   3   4   5   6   3.2 6
2   7   5   4   5   8   3   4.5 700
3   4   8   9   3   5   2   6.7 8640

The problem is there are too many parameters in here def func(M0,M1,M2,M3,M4,M5,age),is there anyway I can make these parameters a list or something else to may the function more clean?

I tried:

def func(df):
    newcol=np.prod
    ([df['M0'].values,df['M1'].values,df['M2'].values,df['M3'].values,df['M4'].values,df['M5'].values][0:df['age'].values.astype(int)])
    return newcol

vfunc = np.frompyfunc(func,1, 1)
df['newcol']=vfunc(df)

Error:

TypeError: 'float' object is not subscriptable

Notice the reason why I don't use pd.apply(),is because in my real business the data is very large and pd.apply()runs very slowly.

CodePudding user response：

This is not optimized, but at least it can be more readable in terms of selecting the M columns, although it has an extra function:

M=["M0","M1","M2","M3","M4","M5"]

def func2(df, M):
    return [df[i].values for i in M] 

def func(age,*Ms):
    newcol=np.prod(Ms[0:age])
    return newcol

vfunc = np.frompyfunc(func, len(M) 1, 1)

df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))

df