Home > Enterprise >  pandas numpy how simplify multiple vectorized function parameters
pandas numpy how simplify multiple vectorized function parameters

Time:12-02

I have a df, you can have it by run this code:

import numpy as np
import pandas as pd
from io import StringIO
dfs = """
    M0     M1   M2  M3 M4   M5 age
1   1      2    3    4  5    6  3.2        
2   7      5    4    5  8    3  4.5
3   4      8    9    3  5    2  6.7
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s ', )

And based on business logic I have the following function, the output is also what I expected:

def func(M0,M1,M2,M3,M4,M5,age):
    newcol=np.prod([M0,M1,M2,M3,M4,M5][0:age])
    return newcol

vfunc = np.frompyfunc(func, 7, 1)
df['newcol']=vfunc(df['M0'].values,df['M1'].values,df['M2'].values,df['M3'].values,df['M4'].values,df['M5'].values,df['age'].values.astype(int))

df

Output is:

M0  M1  M2  M3  M4  M5  age newcol
1   1   2   3   4   5   6   3.2 6
2   7   5   4   5   8   3   4.5 700
3   4   8   9   3   5   2   6.7 8640

The problem is there are too many parameters in here def func(M0,M1,M2,M3,M4,M5,age),is there anyway I can make these parameters a list or something else to may the function more clean?

I tried:

def func(df):
    newcol=np.prod
    ([df['M0'].values,df['M1'].values,df['M2'].values,df['M3'].values,df['M4'].values,df['M5'].values][0:df['age'].values.astype(int)])
    return newcol

vfunc = np.frompyfunc(func,1, 1)
df['newcol']=vfunc(df)

Error:

TypeError: 'float' object is not subscriptable

Notice the reason why I don't use pd.apply(),is because in my real business the data is very large and pd.apply()runs very slowly.

CodePudding user response:

This is not optimized, but at least it can be more readable in terms of selecting the M columns, although it has an extra function:

M=["M0","M1","M2","M3","M4","M5"]

def func2(df, M):
    return [df[i].values for i in M] 

def func(age,*Ms):
    newcol=np.prod(Ms[0:age])
    return newcol

vfunc = np.frompyfunc(func, len(M) 1, 1)

df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))

df
  • Related