i have just started using Pandas and i'm amazed by how flexible it is but i've hit a roadblock and need some help
I have a Data frame Df:
| Contract | Year | Val1 | Val2 | Val3 |
| A | 2020 | 90 | 95. | 100 |
|A | 2019 | 80 | 85.| 90|
|A | 2018 | 75. | 70. | 80. |
| B |2020 | 90. | 95. | 100 |
| B | 2019| 80 | 85. | 90|
| B |2018 | 75. | 70. | 80 |
I wanted to find row wise running difference for Val1,Val2,Val3 after grouping the data on contract and sorting the data on contract and year so that the output looks something like:
Output:
| Contract | Year | Val1 | Val2 | Val3 |
| A | 2020 | 10 |10. | 10 |
| A | 2019 |5 |15. | 10 |
|A |2018 |NaN | NaN | NaN|
| B |2020 |10. | 10. | 10 |
| B |2019 |5 |15. | 10. |
|B | 2018 | NaN | NaN | NaN |
I tried doing this using the sort groupby and diff in Pandas but i'm not getting the right output:
Code i tried:
Df.sort_values(['Contract','Year'],ascending=False).groupby(['Contract']).diff()
This doesn't include Contract and year in the output. Kindly help me out where i'm going wrong.
Thanks
CodePudding user response:
If I understand you correctly, after the .sort_values
you can do:
df.loc[:, "Val1":"Val3"] = df.groupby(df["Contract"]).diff().mul(-1).shift(-1)
print(df)
Prints:
Contract Year Val1 Val2 Val3
3 B 2020 10.0 10.0 10.0
4 B 2019 5.0 15.0 10.0
5 B 2018 NaN NaN NaN
0 A 2020 10.0 10.0 10.0
1 A 2019 5.0 15.0 10.0
2 A 2018 NaN NaN NaN