Python apply lambda with multiple columns-CodePudding

Can anyone tell me why it doesn't work and how to fix it ?

I'm trying to use a lambda function to choose the value of a column based on a condition on another column.

df = pd.DataFrame({'A': [4, 8, 2, 7, 4],
                   'B': [8, 10, 3, 4, 1],
                   'C': [10, 8, 2, 6, 2]})

df

`df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'])`

KeyError                                  Traceback (most recent call last)
c:\xxxxxx\xxxxxx\xxx Cellule 19 in <cell line: 1>()
----> 1 df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'])

File c:\Anaconda\envs\xxxxx\xxxx.py:8839, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
   8828 from pandas.core.apply import frame_apply
   8830 op = frame_apply(
   8831     self,
   8832     func=func,
   (...)
   8837     kwargs=kwargs,
   8838 )
-> 8839 return op.apply().__finalize__(self, method="apply")

File c:\Anaconda\xxxxxlib\site-packages\pandas\core\apply.py:727, in FrameApply.apply(self)
    724 elif self.raw:
    725     return self.apply_raw()
--> 727 return self.apply_standard()

File c:\Anaconda\envs\xxxx\pandas\core\apply.py:851, in FrameApply.apply_standard(self)
    850 def apply_standard(self):
--> 851     results, res_index = self.apply_series_generator()
    853     # wrap results
    854     return self.wrap_results(results, res_index)
...
    388     self._check_indexing_error(key)
--> 389     raise KeyError(key)
    390 return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: 'A'

CodePudding user response：

Do not use apply, this is a waste of pandas' vectorial capabilities.

Use instead:

df['new'] = df['B'].where(df['A'].isin([1,2,3,4,5]), df['C'])

# or
df['new'] = df['B'].where(df['A'].between(1, 5, inclusive='both'), df['C'])

Or with numpy:

import numpy as np
df['new'] = np.where(df['A'].isin([1,2,3,4,5]), df['B'], df['C'])

output:

   A   B   C  new
0  4   8  10    8
1  8  10   8    8
2  2   3   2    3
3  7   4   6    6
4  4   1   2    1

CodePudding user response：

you need to specify the axis=1 attribute. refer to dataframe.apply

df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'], axis=1)