I have a dataframe and you can have it by run this code:
import numpy as np
import pandas as pd
from io import StringIO
# np.prod(PofMinTab1[LowerIntegralAge1[0]-1:(LowerIntegralAge1[0])])
dfs = """
M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 age0 age4
1 1 2 3 4 5 6 1 2 3 4 5 6 1 3 5
2 7 5 4 5 8 3 1 2 3 4 5 6 1 4 8
3 4 8 9 3 5 2 1 2 3 4 5 6 1 6 9
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s ', )
And I have a .apply function:
def func(row):
age0=row['age0']
age4=row['age4']
mt =[row['M0'],row['M1'],row['M2'],row['M3'], row['M4'],row['M5'],row['M6'],
row['M7'],row['M8'],row['M9'],row['M10'],row['M11'],row['M12']]
return np.prod(mt[age0:age4])
df['newcol']=df.apply(lambda row: func(row), axis=1)
the output is:
M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 age0 age4 newcol
1 1 2 3 4 5 6 1 2 3 4 5 6 1 3 5 20
2 7 5 4 5 8 3 1 2 3 4 5 6 1 4 8 48
3 4 8 9 3 5 2 1 2 3 4 5 6 1 6 9 6
Since in my real business,I have 100000 rows data,each time I use .apply function it is very slow, so I've converted most of my functions to vectorized function.
So my question,is there any way I can convert this one to numpy vectorized way,or any other way that can make it runs very fast?
Any friend can help?
CodePudding user response:
One idea I had was to mask the original data using a condition on column indices.
# Create indices for the columns you want to compute the product over
idx = np.arange(len(df.columns) - 2)
# Create a mask of bools which correspond to the values the product
# should be computed for
m = ((idx[None, :] >= df['age0'].to_numpy()[:,None])
& (idx < df['age4'].to_numpy()[:,None]))
# Use `np.where` to apply the mask and `np.prod` to compute the row-wise product
df['result'] = np.prod(np.where(m, df.iloc[:, :-2], 1), axis=1)
df
M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 age0 age4 result
1 1 2 3 4 5 6 1 2 3 4 5 6 1 3 5 20
2 7 5 4 5 8 3 1 2 3 4 5 6 1 4 8 48
3 4 8 9 3 5 2 1 2 3 4 5 6 1 6 9 6
CodePudding user response:
You can test for loop
[df.loc[z][x:y].prod() for x , y, z in zip(df['age0'],df['age4'],df.index)]
Out[43]: [20, 48, 6]