Can anyone tell me why it doesn't work and how to fix it ?
I'm trying to use a lambda function to choose the value of a column based on a condition on another column.
df = pd.DataFrame({'A': [4, 8, 2, 7, 4],
'B': [8, 10, 3, 4, 1],
'C': [10, 8, 2, 6, 2]})
df
`df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'])`
KeyError Traceback (most recent call last)
c:\xxxxxx\xxxxxx\xxx Cellule 19 in <cell line: 1>()
----> 1 df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'])
File c:\Anaconda\envs\xxxxx\xxxx.py:8839, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
8828 from pandas.core.apply import frame_apply
8830 op = frame_apply(
8831 self,
8832 func=func,
(...)
8837 kwargs=kwargs,
8838 )
-> 8839 return op.apply().__finalize__(self, method="apply")
File c:\Anaconda\xxxxxlib\site-packages\pandas\core\apply.py:727, in FrameApply.apply(self)
724 elif self.raw:
725 return self.apply_raw()
--> 727 return self.apply_standard()
File c:\Anaconda\envs\xxxx\pandas\core\apply.py:851, in FrameApply.apply_standard(self)
850 def apply_standard(self):
--> 851 results, res_index = self.apply_series_generator()
853 # wrap results
854 return self.wrap_results(results, res_index)
...
388 self._check_indexing_error(key)
--> 389 raise KeyError(key)
390 return super().get_loc(key, method=method, tolerance=tolerance)
KeyError: 'A'
CodePudding user response:
Do not use apply
, this is a waste of pandas' vectorial capabilities.
Use instead:
df['new'] = df['B'].where(df['A'].isin([1,2,3,4,5]), df['C'])
# or
df['new'] = df['B'].where(df['A'].between(1, 5, inclusive='both'), df['C'])
Or with numpy:
import numpy as np
df['new'] = np.where(df['A'].isin([1,2,3,4,5]), df['B'], df['C'])
output:
A B C new
0 4 8 10 8
1 8 10 8 8
2 2 3 2 3
3 7 4 6 6
4 4 1 2 1
CodePudding user response:
you need to specify the axis=1
attribute. refer to dataframe.apply
df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'], axis=1)