Having the following DF :
Name_A | Name_B | A | B | Type_A | Type_B |
---|---|---|---|---|---|
Test | Test | 5 | 5 | Game | Game |
nan | Test | nan | 5 | nan | Game |
nan | Test | nan | 10 | nan | Game |
doing :
DF.groupby(['Name_A',"Type_A"], as_index=False)[['A','B']].sum()
NewDF = DF.where(DF['TypeA']== "Game")
returns the following
Name_A | Name_B | A | B | Type_A | Type_B |
---|---|---|---|---|---|
Test | nan | 5 | 5 | Game | Game |
But I would like to get :
Name_A | Name_B | A | B | Type_A | Type_B |
---|---|---|---|---|---|
Test | Test | 5 | 20 | Game | Game |
Is it possible to do so ?
Maybe I need to use merge
instead of group by? The answer might be close to this one but I need to use a different group by.
CodePudding user response:
Given OP new requirement, achieve that using numpy.where
, pandas.DataFrame.astype
(to be able to handle nan
values) and .sum
as follows
df['A'] = np.where((df['Name_A'] == 'Test') & (df['Type_A'] == 'Game'), df['A'].astype(float).sum(), df['A'])
df['B'] = np.where((df['Name_B'] == 'Test') & (df['Type_B'] == 'Game'), df['B'].astype(float).sum(), df['B'])
[Out]:
Name_A Name_B A B Type_A Type_B
0 Test Test 5.0 20.0 Game Game
1 nan Test nan 20.0 nan Game
2 nan Test nan 20.0 nan Game
Then, considering that OP only wants to retrieve the first line, one can use pandas.DataFrame.iloc
df = df.iloc[0:1]
[Out]:
Name_A Name_B A B Type_A Type_B
0 Test Test 5.0 20.0 Game Game
One can wrap all those operations in a function, let's call it gamefunc
as follows
def gamefunc(df):
df['A'] = np.where((df['Name_A'] == 'Test') & (df['Type_A'] == 'Game'), df['A'].astype(float).sum(), df['A'])
df['B'] = np.where((df['Name_B'] == 'Test') & (df['Type_B'] == 'Game'), df['B'].astype(float).sum(), df['B'])
df = df.iloc[0:1]
return df
And then all one has to do is to apply the function to the dataframe
df = gamefunc(df)
[Out]:
Name_A Name_B A B Type_A Type_B
0 Test Test 5.0 20.0 Game Game