Home > OS >  Create new column from a row value in a grouped data frame?
Create new column from a row value in a grouped data frame?

Time:06-30

I have a data frame data

data_ = {'ID': [777, 777, 777,777,777,777],'Month':[1,1,1,2,2,2], 'Salary': [130,170,50,140,180,60], 'O': ["AC","BR","BR","AC","BR","BR"], 'D':["LF","AC","LF","LF","AC","LF"], 'B':[True,True,False,True,True,False]}

data = pd.DataFrame(data=data_)

enter image description here for each subgroup of this data frame:

  Subgroup=data.groupby(["ID","Month"])

I would like to add a new column NEW_Salary filled with the values of Salary where B is false in each subgroup as show in the picture below. I don't know exactly how I can do that enter image description here

CodePudding user response:

Filter by B False before groupby then join with your initial dataframe

Subgroup = data[~data['B']] \
   .groupby(["ID","Month"]) \
   .agg(NEW_SALARY = ('Salary', 'mean')) \
   .reset_index()

final_df = data.merge(Subgroup, on=['ID', 'Month'])
final_df

Output:

enter image description here

CodePudding user response:

Let us try transform with mask

df['New_Salary'] = df['Salary'].mask(df['B']).groupby([df['ID'],df['Month']]).transform('max')
df
Out[255]: 
    ID  Month  Salary   O   D      B  New_Salary
0  777      1     130  AC  LF   True        50.0
1  777      1     170  BR  AC   True        50.0
2  777      1      50  BR  LF  False        50.0
3  777      2     140  AC  LF   True        60.0
4  777      2     180  BR  AC   True        60.0
5  777      2      60  BR  LF  False        60.0

CodePudding user response:

If possible multiple False values per groups and need aggregate them, e.g. by sum create helper column new filled by 0 if no match and then use GroupBy.transform for new column:

data['New_Salary'] = (data.assign(new=data['Salary'].mask(data['B'], 0))
                          .groupby(["ID","Month"])['new']
                          .transform('sum'))
print (data)
    ID  Month  Salary   O   D      B  New_Salary
0  777      1     130  AC  LF   True          50
1  777      1     170  BR  AC   True          50
2  777      1      50  BR  LF  False          50
3  777      2     140  AC  LF   True          60
4  777      2     180  BR  AC   True          60
5  777      2      60  BR  LF  False          60

If there is only one False value per groups is possible use DataFrame.join with MultiIndex Series filtered only False rows:

s = data.loc[~data['B']].set_index(["ID","Month"])['Salary'].rename('New_Salary')
data = (data.join(s, on=['ID','Month']))
print (data)
    ID  Month  Salary   O   D      B  New_Salary
0  777      1     130  AC  LF   True          50
1  777      1     170  BR  AC   True          50
2  777      1      50  BR  LF  False          50
3  777      2     140  AC  LF   True          60
4  777      2     180  BR  AC   True          60
5  777      2      60  BR  LF  False          60
  • Related