Home > Net >  Retrieving the columns based on a condition in dataframe
Retrieving the columns based on a condition in dataframe

Time:12-10

Input Dataframe:

 id typeofAddress         city state
  1          Home      Kolkata    WB
  1        Office     Calcutta    WB
  2          Home     Columbus    OH
  3          Home                   
  3        Office SanFrancisco    CA

I have to pull in the rows where typeofAddress is Home and city is not empty otherwise pull in the rows where typeofAddress is Office

Output:

 id typeofAddress         city state
  1          Home      Kolkata    WB
  2          Home     Columbus    OH
  3        Office SanFrancisco    CA

CodePudding user response:

Create a rank column according your priorities:

condlist = [df['typeofAddress'].eq('Home') & df['city'].ne(''),
            df['typeofAddress'].eq('Office') & df['city'].ne('')]
rank = np.select(condlist, choicelist=[1, 2], default=3)

out = df.assign(rank=rank).sort_values('rank') \
        .groupby('id').first() \
        .drop(columns='rank').reset_index()

Output:

>>> out
   id typeofAddress          city state
0   1          Home       Kolkata    WB
1   2          Home      Columbus    OH
2   3        Office  SanFrancisco    CA

CodePudding user response:

You can solve this using boolean masks on your DataFrame. You can google "boolean masking with pandas" for more details.

import pandas as pd
d={'typeofAddress':['Home','Office','Home','Home','Office'],'city':['Kolkata','Calcutta','Columbus','','SanFrancisco'],'state':['WB','WB','OH','','CA']}
df=pd.DataFrame(d)
output=df[((df['typeofAddress']=='Home')&(df['city']!=''))|(df['typeofAddress']=='Office')]
output
  • Related