Input Dataframe:
id typeofAddress city state
1 Home Kolkata WB
1 Office Calcutta WB
2 Home Columbus OH
3 Home
3 Office SanFrancisco CA
I have to pull in the rows where typeofAddress
is Home
and city
is not empty otherwise pull in the rows where typeofAddress
is Office
Output:
id typeofAddress city state
1 Home Kolkata WB
2 Home Columbus OH
3 Office SanFrancisco CA
CodePudding user response:
Create a rank column according your priorities:
condlist = [df['typeofAddress'].eq('Home') & df['city'].ne(''),
df['typeofAddress'].eq('Office') & df['city'].ne('')]
rank = np.select(condlist, choicelist=[1, 2], default=3)
out = df.assign(rank=rank).sort_values('rank') \
.groupby('id').first() \
.drop(columns='rank').reset_index()
Output:
>>> out
id typeofAddress city state
0 1 Home Kolkata WB
1 2 Home Columbus OH
2 3 Office SanFrancisco CA
CodePudding user response:
You can solve this using boolean masks on your DataFrame. You can google "boolean masking with pandas" for more details.
import pandas as pd
d={'typeofAddress':['Home','Office','Home','Home','Office'],'city':['Kolkata','Calcutta','Columbus','','SanFrancisco'],'state':['WB','WB','OH','','CA']}
df=pd.DataFrame(d)
output=df[((df['typeofAddress']=='Home')&(df['city']!=''))|(df['typeofAddress']=='Office')]
output