Home > other >  Find Per group a specific value in another Column if does not exist in that group delete that group
Find Per group a specific value in another Column if does not exist in that group delete that group

Time:07-08

I am having a trouble in deleting some records from a dataframe. If i groupby a certain column and check for each this column group in another column that for each group does it have any specific value in another column if that specific value does not exist delete that whole group from first column (The column upon which we have applied group by earlier). My data look like this:

search_value_per_group Column_to_Be_Grouped
Pakistan Ehsan
Saudi Arab Irshad
Pakistan Ayesha
India Ehsan
Switzerland Ehsan
Nigeria Ehsan
Saudi Arabia Ayesha
UK Ayesha
Pakistan Zohan
Afghanistan Zohan
Iraq Zohan
Iran Zohan
USA Zohan
Netherland Irshad
Switzerland Irshad
India Irshad

I want to delete all that whole group which do not have Pakistan in that group. For example in my dataframe i will Delete All Irshad from Column_to_Be_Grouped because in all irshad i do not have Pakistan and my desired output will look like this as follows:

search_value_per_group Column_to_Be_Grouped
Pakistan Ehsan
Pakistan Ayesha
India Ehsan
Switzerland Ehsan
Nigeria Ehsan
Saudi Arabia Ayesha
UK Ayesha
Pakistan Zohan
Afghanistan Zohan
Iraq Zohan
Iran Zohan
USA Zohan

CodePudding user response:

Get all matched groups and filter them by boolean indexing with Series.isin:

groups = df.loc[df['search_value_per_group'].eq('Pakistan'),'Column_to_Be_Grouped']
df1 = df[df['Column_to_Be_Grouped'].isin(groups)]

print (df1)
   search_value_per_group Column_to_Be_Grouped
0                Pakistan                Ehsan
2                Pakistan               Ayesha
3                   India                Ehsan
4             Switzerland                Ehsan
5                 Nigeria                Ehsan
6            Saudi Arabia               Ayesha
7                      UK               Ayesha
8                Pakistan                Zohan
9             Afghanistan                Zohan
10                   Iraq                Zohan
11                   Iran                Zohan
12                    USA                Zohan

CodePudding user response:

You can use GroupBy.transform('any') to generate a boolean Series for boolean indexing:

out = df[df['search_value_per_group'].eq('Pakistan')
         .groupby(df['Column_to_Be_Grouped']).transform('any')]

output:

   search_value_per_group Column_to_Be_Grouped
0                Pakistan                Ehsan
2                Pakistan               Ayesha
3                   India                Ehsan
4             Switzerland                Ehsan
5                 Nigeria                Ehsan
6            Saudi Arabia               Ayesha
7                      UK               Ayesha
8                Pakistan                Zohan
9             Afghanistan                Zohan
10                   Iraq                Zohan
11                   Iran                Zohan
12                    USA                Zohan
  • Related