Home > Enterprise >  Pandas filter up to a specific value within a group
Pandas filter up to a specific value within a group

Time:04-12

I'm working with this dataframe:

Keyword    URL    Type              Position_Group 
A          A      Ad                1      
A          B      Ad                2       
A          C      Organic           1           
A          D      Organic           2          
A          E      Organic           3
A          F      Featured_Snippet  1           
..
A          P      Organic           20
A          Q      Organic           21
A          R      Ad                6     

I want to be able to filter up to Type = Organic & Position_Group <= 20 while also being inclusive of the other Type (Ad & Featured_Snippet) since its results fall within top 20 organic positions. The goal of this use case is to be able to filter up to top 20 Organic positions while also capturing other Type in between.

The expected output should be like this:

Keyword    URL    Type              Position_Group 
A          A      Ad                1
A          B      Ad                2             
A          C      Organic           1           
A          D      Organic           2          
A          E      Organic           3
A          F      Featured_Snippet  1           
..
A          P      Organic           20

Thanks in advance!

CodePudding user response:

Assuming the dataframe is df:

df.iloc[: list(df[(df["Type"] == "Organic") & (df["Position_Group"] == 20)].index)[0] 1]

This filters the dataframe by Type == "Organic" and Position_Group == 20, then returns the index to a list and takes the first item (there should only be one item, but I have found the list is necessary to get the actual value, not Int64Index([6], dtype='int64')). 1 is added to this so that it is included, and the rows up to there are taken from the dataframe.

EDIT

For the addition of for each x in Keyword, it can be completed with a for loop:

for i, j in df.groupby("Keyword"):
    j.reset_index(drop=True, inplace=True)
    print(j.iloc[: list(j[(j["Type"] == "Organic") & (j["Position_Group"] == 20)].index)[0] 1])

I have used print here, but if you wanted to create a new dataframe with only these rows:

df2 = pd.DataFrame(columns=df.columns)

for i, j in df.groupby("Keyword"):
    j.reset_index(drop=True, inplace=True)
    df2 = pd.concat([df2, j.iloc[: list(j[(j["Type"] == "Organic") & (j["Position_Group"] == 20)].index)[0] 1]])
  • Related