Home > Software engineering >  Pandas groupby including value in all groups
Pandas groupby including value in all groups

Time:07-15

Is there any efficient way to groupby, but keeping a keyword value in all groups? For instance the word "all" belongs to every group instead of his own.

Such as:

df = pd.DataFrame({'ID':         ['one',    'two',  'two',   'two',   'one'],
                   'condition1': ['all',    'red',  'all',   'green',   'red'],
                   'condition2': ['yellow', 'black','black', 'orange', 'all']})

df.groupby(['condition1','condition2']).apply(print)

So the row with ['all','black'] should be in the same group as ['red', 'black'], the expected output is :

    ID condition1 condition2
0  one        all     yellow
    ID condition1 condition2
2  two        all      black  # here is the point of the problem
1  two        red      black
4  one        red      black
    ID condition1 condition2
3  two      green        all
2  two        all      black  # this row belongs to this group too

I tried to substitute 'all' for the set of the column and explode it, it does work, but is not efficient in real life dataframes.

Edit: now I realize there are some pathological behaviours defining the groups when they have the word "all". So it may be impossible to solve this question, without constraining the scope or the allowed groups.

CodePudding user response:

this is not a solution but might be helpful for you. you can pass a function in groupby instead of by. This function inputs an index and outputs a group based on a condition that you should define:

def your_func(idx):
    #check the condition for df.loc[idx] and output a group like 'group1' 
    return the_correct_group
df.groupby(your_func).apply(print)

CodePudding user response:

I still don't know what is the problem. I use the same code you provided and get different result to your output

enter image description here

  • Related