Home > Net >  Does a flexible group by function exist?
Does a flexible group by function exist?

Time:09-20

Having the following DF :

Name_A Name_B A B Type_A Type_B
Test Test 5 5 Game Game
nan Test nan 5 nan Game
nan Test nan 10 nan Game

doing :

DF.groupby(['Name_A',"Type_A"], as_index=False)[['A','B']].sum()
NewDF = DF.where(DF['TypeA']== "Game")

returns the following

Name_A Name_B A B Type_A Type_B
Test nan 5 5 Game Game

But I would like to get :

Name_A Name_B A B Type_A Type_B
Test Test 5 20 Game Game

Is it possible to do so ?

Maybe I need to use merge instead of group by? The answer might be close to this one but I need to use a different group by.

CodePudding user response:

Given OP new requirement, achieve that using numpy.where, pandas.DataFrame.astype (to be able to handle nan values) and .sum as follows

df['A'] = np.where((df['Name_A'] == 'Test') & (df['Type_A'] == 'Game'), df['A'].astype(float).sum(), df['A'])

df['B'] = np.where((df['Name_B'] == 'Test') & (df['Type_B'] == 'Game'), df['B'].astype(float).sum(), df['B'])

[Out]:
  Name_A Name_B    A     B Type_A Type_B
0   Test   Test  5.0  20.0   Game   Game
1    nan   Test  nan  20.0    nan   Game
2    nan   Test  nan  20.0    nan   Game

Then, considering that OP only wants to retrieve the first line, one can use pandas.DataFrame.iloc

df = df.iloc[0:1]

[Out]:
  Name_A Name_B    A     B Type_A Type_B
0   Test   Test  5.0  20.0   Game   Game

One can wrap all those operations in a function, let's call it gamefunc as follows

def gamefunc(df):

    df['A'] = np.where((df['Name_A'] == 'Test') & (df['Type_A'] == 'Game'), df['A'].astype(float).sum(), df['A'])

    df['B'] = np.where((df['Name_B'] == 'Test') & (df['Type_B'] == 'Game'), df['B'].astype(float).sum(), df['B'])

    df = df.iloc[0:1]

    return df

And then all one has to do is to apply the function to the dataframe

df = gamefunc(df)

[Out]:
  Name_A Name_B    A     B Type_A Type_B
0   Test   Test  5.0  20.0   Game   Game
  • Related