So I'm on a mission to hunt down some memory hogs in my script, and having read about the problems with iterrows()/itertuples() I'm currently trying to figure out how to use vectorization to do what is probably quite simple.
The issue boils down to:
A dataframe:
df = pd.DataFrame(data=d)
col1 col2 col3
0 1 NaN NaN
1 2 NaN NaN
A function returning two different values based on a single input:
def func(a):
return a 1,a 5
Now I would like to apply this function to each cell in col1 that is equal to 1, and store the result in col2 and col3 of the same row.
At the moment, I'm doing this with a more extended version of this abomination:
for row in df.loc[df['col1'] == 1].itertuples():
a,b = func(df.loc[row.Index, 'col1'])
df.loc[row.Index,'col2'] = a
df.loc[row.Index,'col3'] = b
Resulting in:
col1 col2 col3
0 1 2.0 6.0
1 2 NaN NaN
How would you re-write this to be vectorized/performant? Thanks
CodePudding user response:
try:
df['col2']=df.where(data["col1"]==1)["col1"].apply(lambda x:func(x)[0])
df['col3']=df.where(data["col1"]==1)["col1"].apply(lambda x:func(x)[1])
CodePudding user response:
def func(a):
return a 1,a 5
df['col2'],df['col3'] = np.where(
df['col1']==1, # condition
func(df['col1']), # if true, do this
np.NaN # if false, do this
)
CodePudding user response:
for your example you can use apply function, we can add your condition in the function:
df=pd.DataFrame([[1,0,0],[2,0,0]],columns=['col1','col2','col3'])
def funct(a):
if a==1:
return(a 1,a 5)
df[['col2','col3']]=df.col1.apply(funct).tolist()