I have a df known as df2 as shown:
Name Age Experience Education
Archana 35 8 Bachelors
Sharad 39 12 Bachelors
Jitesh 30 2 Diploma
Sukanya 45 18 Bachelors
Shirish 40 15 Bachelors
I want to filter data and add a column promotion which I want to set as 1 in the df as per given conditions:
- If education = Bachelors
- If experience > 10
- If age >30
Hence the expected df should be:
I know that I can use np.where for the given task but I have to convert all the columns to string type as Education column is string data type
Hence is there any faster way apart from np.where wherein I could achieve similar result without converting columns
I used
df2['prom'] = (df2['Age']>30)&(df2['experience']>10)&(df2['education' == 'Bachelors'])
But it gives me following error:
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: False
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6476/2030827498.py in <module>
1 #df2['ELIGIBLE_FOR_DISCOUNT'] = np.where((df2['TENURE'] >= '60') & (df2['NO_OF_FAMILY_MEMBERS'] >= '4') & (df2['EMPLOYMENT_STATUS'] =='N'), 1, 0)
2
----> 3 df2['ELIGIBLE_FOR_DISCOUNT'] = (df2['TENURE']>60)&(df2['NO_OF_FAMILY_MEMBERS']>3)&(df2['EMPLOYMENT_STATUS' == 'N'])
4
5
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: False
CodePudding user response:
Use:
df['prom'] = (df['Age']>30)&(df['experience']>10)&(df['education' == 'Bachelors'])
if the age and experience columns are not numerical:
df['prom'] = (df['Age'].astype(int)>30)&(df['experience'].astype(int)>10)&(df['education' == 'Bachelors'])
CodePudding user response:
As suggested in one of the comments use:
df['promotion'] = (df['Education'].eq('Bachelors') & df['Experience'].gt(10) & df['Age'].gt(30)).astype(int)