Home > Blockchain >  Vectorized conditional column
Vectorized conditional column

Time:10-25

Say I had a data frame like the following:

df = pd.DataFrame()

df['v'] = [0,0,0,0,0,1,1,1,1]
df['w'] = [1,1,1,1,1,0,0,0,0]

df['x'] = (df.v   df.w) 10
df['y'] = (df.v   df.w) 5

df['z'] = ...

I need a new column, df.z, to equal df.x if df.v = 1 and df.y if df.w = 1

Of course, I could use df.apply here:

def non_vector(row):
    
    if row['v'] == 1: return row['x']
    if row['w'] == 1: return row['y'] 

df['z'] = df.apply(non_vector, axis=1)

print df

   v  w   x  y   z
0  0  1  11  6   6
1  0  1  11  6   6
2  0  1  11  6   6
3  0  1  11  6   6
4  0  1  11  6   6
5  1  0  11  6  11
6  1  0  11  6  11
7  1  0  11  6  11
8  1  0  11  6  11

But the issue seems straight forward enough for a vectorized method, as this is actually painfully slow.

Any help appreciated.

CodePudding user response:

Why not do this:

df['z'] = np.where(df['v']==1, df['x'],np.where(df['v']==0,df['y'], np.nan))

If df.v only takes the values 0 and 1, then

df['z'] = np.where(df['v']==1, df['x'],df['y'])

is enough. In both case you'd get:

v  w   x  y     z
0  0  1  11  6   6.0
1  0  1  11  6   6.0
2  0  1  11  6   6.0
3  0  1  11  6   6.0
4  0  1  11  6   6.0
5  1  0  11  6  11.0
6  1  0  11  6  11.0
7  1  0  11  6  11.0
8  1  0  11  6  11.0
  • Related