I have a data frame df
where I would like to create new column ID
which is a diagonal combination of two other columns ID1
& ID2
.
This is the data frame:
import pandas as pd
df = pd.DataFrame({'Employee':[5,5,5,20,20],
'Department':[4,4,4,6,6],
'ID':['AB','CD','EF','XY','AA'],
'ID2':['CD','EF','GH','AA','ZW']},)
This is how the initial data frame looks like:
Employee Department ID1 ID2
0 5 4 AB CD
1 5 4 CD EF
2 5 4 EF GH
3 20 6 XY AA
4 20 6 AA ZW
If I group df
by Employee
& Department
:
df2=df.groupby(["Employee","Department"])
I would have only two option of groups, groups containing two rows or groups containing three rows.
The column ID
would be the sum of ID1 of the first row & ID2 of the next row & for the last row of the group, ID would take the value of the previous ID.
Expected output:
Employee Department ID1 ID2 ID
0 5 4 AB CD ABEF
1 5 4 CD EF CDGH
2 5 4 EF GH CDGH
3 20 6 XY AA XYZW
4 20 6 AA ZW XYZW
I thought about using shift()
df2["ID"]=df["ID1"] df["ID2"].shift(-1)
But I could not quite figure it out. Any ideas ?
CodePudding user response:
(df["ID1"] df.groupby(["Employee", "Department"])["ID2"].shift(-1)).ffill()
almost your code, but we first groupby and then shift up. Lastly forward fill for those last rows per group.
In [24]: df
Out[24]:
Employee Department ID1 ID2
0 5 4 AB CD
1 5 4 CD EF
2 5 4 EF GH
3 20 6 XY AA
4 20 6 AA ZW
In [25]: df["ID"] = (df["ID1"] df.groupby(["Employee", "Department"])["ID2"].shift(-1)).ffill()
In [26]: df
Out[26]:
Employee Department ID1 ID2 ID
0 5 4 AB CD ABEF
1 5 4 CD EF CDGH
2 5 4 EF GH CDGH
3 20 6 XY AA XYZW
4 20 6 AA ZW XYZW
CodePudding user response:
You can groupby.shift
, concatenate, and ffill
:
df['ID'] = (df['ID1'] df.groupby(['Employee', 'Department'])['ID2'].shift(-1)
).ffill()
output:
Employee Department ID1 ID2 ID
0 5 4 AB CD ABEF
1 5 4 CD EF CDGH
2 5 4 EF GH CDGH
3 20 6 XY AA XYZW
4 20 6 AA ZW XYZW