Home > Enterprise >  Sum two columns in a grouped data frame using shift()
Sum two columns in a grouped data frame using shift()

Time:09-15

I have a data frame df where I would like to create new column ID which is a diagonal combination of two other columns ID1 & ID2.

This is the data frame:

import pandas as pd

df = pd.DataFrame({'Employee':[5,5,5,20,20],
                   'Department':[4,4,4,6,6],
                   'ID':['AB','CD','EF','XY','AA'],
                   'ID2':['CD','EF','GH','AA','ZW']},)  

This is how the initial data frame looks like:

  Employee  Department  ID1 ID2  
0         5           4  AB  CD   
1         5           4  CD  EF   
2         5           4  EF  GH       
3        20           6  XY  AA   
4        20           6  AA  ZW 

If I group df by Employee & Department:

df2=df.groupby(["Employee","Department"])

I would have only two option of groups, groups containing two rows or groups containing three rows.

The column ID would be the sum of ID1 of the first row & ID2 of the next row & for the last row of the group, ID would take the value of the previous ID.

Expected output:

        Employee  Department ID1 ID2 ID  
    0         5           4  AB  CD  ABEF 
    1         5           4  CD  EF  CDGH 
    2         5           4  EF  GH  CDGH     
    3        20           6  XY  AA  XYZW 
    4        20           6  AA  ZW  XYZW

I thought about using shift()

df2["ID"]=df["ID1"] df["ID2"].shift(-1)

But I could not quite figure it out. Any ideas ?

CodePudding user response:

(df["ID1"]   df.groupby(["Employee", "Department"])["ID2"].shift(-1)).ffill()

almost your code, but we first groupby and then shift up. Lastly forward fill for those last rows per group.

In [24]: df
Out[24]:
   Employee  Department ID1 ID2
0         5           4  AB  CD
1         5           4  CD  EF
2         5           4  EF  GH
3        20           6  XY  AA
4        20           6  AA  ZW

In [25]: df["ID"] = (df["ID1"]   df.groupby(["Employee", "Department"])["ID2"].shift(-1)).ffill()

In [26]: df
Out[26]:
   Employee  Department ID1 ID2    ID
0         5           4  AB  CD  ABEF
1         5           4  CD  EF  CDGH
2         5           4  EF  GH  CDGH
3        20           6  XY  AA  XYZW
4        20           6  AA  ZW  XYZW

CodePudding user response:

You can groupby.shift, concatenate, and ffill:

df['ID'] = (df['ID1'] df.groupby(['Employee', 'Department'])['ID2'].shift(-1)
           ).ffill()

output:

   Employee  Department ID1 ID2    ID
0         5           4  AB  CD  ABEF
1         5           4  CD  EF  CDGH
2         5           4  EF  GH  CDGH
3        20           6  XY  AA  XYZW
4        20           6  AA  ZW  XYZW
  • Related