Home > database >  Is it possible to form groups in a dataframe with rows having a value in a column in addition to gro
Is it possible to form groups in a dataframe with rows having a value in a column in addition to gro

Time:02-02

I tried a lot but could not find a way to do the following and even I am not sure if it is possible in pandas.

Please have the diagram bellow

Assume I have a dataframe like in (1). When I use dataframe.groupby() on "col-a" i get (2) and i can process the groupbydataframe as usual, for example by applying a function. My question is :

Is it possible to group the dataframe like in (3) before processing (the row having "1" at Col-x to be included in group2 with a condition or something... or is it possible to apply a function to include that row belonging to group1 in group2 while processing.

Thank you all for your attention.

Last one request and may be the most imortant one :), altough i started learning pandas a while ago, as a retired software developer i still have a difficulty of understanding its inner mechanism. May a pandas pro please advice me a document,book,method or another resource to learn Panda's basic principles well since, I really love it.

CodePudding user response:

groupby can use a defined function to select groups. The function can combine column values in any way you want. To use your example this could be done along these lines:

df = pd.DataFrame({ 'col_a': ['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b','b','b','b','b'],
                    'col_x': [0,0,0,0,0,1,0,0,0,0,0,0],
                    'col_calc': [1,1,1,1,1,99,1,1,1,1,1,1]
                    })

def func(mdf, idx, col1, col2):
    x = mdf[col1].loc[idx]
    y = mdf[col2].loc[idx]
    if x == 'a' and y == 0:
        return 'g1'
    if x == 'b' or y == 1:
        return 'g2'
    
df2 = df.groupby(lambda x: func(df, x, 'col_a', 'col_x'))['col_calc'].sum()
                 
print(df2) 

which gives:

g1      5
g2    105
  • Related