Home > Mobile >  Filtering Data in pandas as per condition
Filtering Data in pandas as per condition

Time:02-27

I have a df known as df2 as shown:

Name    Age Experience  Education
Archana 35  8           Bachelors
Sharad  39  12          Bachelors
Jitesh  30  2           Diploma
Sukanya 45  18          Bachelors
Shirish 40  15          Bachelors

I want to filter data and add a column promotion which I want to set as 1 in the df as per given conditions:

  1. If education = Bachelors
  2. If experience > 10
  3. If age >30

Hence the expected df should be:

enter image description here

I know that I can use np.where for the given task but I have to convert all the columns to string type as Education column is string data type

Hence is there any faster way apart from np.where wherein I could achieve similar result without converting columns

I used

df2['prom'] = (df2['Age']>30)&(df2['experience']>10)&(df2['education' == 'Bachelors'])

But it gives me following error:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6476/2030827498.py in <module>
      1 #df2['ELIGIBLE_FOR_DISCOUNT'] = np.where((df2['TENURE'] >= '60') & (df2['NO_OF_FAMILY_MEMBERS'] >= '4') & (df2['EMPLOYMENT_STATUS'] =='N'), 1, 0)
      2 
----> 3 df2['ELIGIBLE_FOR_DISCOUNT'] = (df2['TENURE']>60)&(df2['NO_OF_FAMILY_MEMBERS']>3)&(df2['EMPLOYMENT_STATUS' == 'N'])
      4 
      5 

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: False

CodePudding user response:

Use:

df['prom'] = (df['Age']>30)&(df['experience']>10)&(df['education' == 'Bachelors'])

if the age and experience columns are not numerical:

df['prom'] = (df['Age'].astype(int)>30)&(df['experience'].astype(int)>10)&(df['education' == 'Bachelors'])

CodePudding user response:

As suggested in one of the comments use:

df['promotion'] = (df['Education'].eq('Bachelors') & df['Experience'].gt(10) & df['Age'].gt(30)).astype(int)
  • Related