Home > Software design >  Pandas Sort, Groupby and running difference
Pandas Sort, Groupby and running difference

Time:07-11

i have just started using Pandas and i'm amazed by how flexible it is but i've hit a roadblock and need some help

I have a Data frame Df:

| Contract | Year | Val1 | Val2 | Val3 |

| A | 2020 | 90 | 95. | 100 |

|A | 2019 | 80 | 85.| 90|

|A | 2018 | 75. | 70. | 80. |

| B |2020 | 90. | 95. | 100 |

| B | 2019| 80 | 85. | 90|

| B |2018 | 75. | 70. | 80 |

I wanted to find row wise running difference for Val1,Val2,Val3 after grouping the data on contract and sorting the data on contract and year so that the output looks something like:

Output:

| Contract | Year | Val1 | Val2 | Val3 |

| A | 2020 | 10 |10. | 10 |

| A | 2019 |5 |15. | 10 |

|A |2018 |NaN | NaN | NaN|

| B |2020 |10. | 10. | 10 |

| B |2019 |5 |15. | 10. |

|B | 2018 | NaN | NaN | NaN |

I tried doing this using the sort groupby and diff in Pandas but i'm not getting the right output:

Code i tried:

Df.sort_values(['Contract','Year'],ascending=False).groupby(['Contract']).diff()

This doesn't include Contract and year in the output. Kindly help me out where i'm going wrong.

Thanks

CodePudding user response:

If I understand you correctly, after the .sort_values you can do:

df.loc[:, "Val1":"Val3"] = df.groupby(df["Contract"]).diff().mul(-1).shift(-1)
print(df)

Prints:

  Contract  Year  Val1  Val2  Val3
3        B  2020  10.0  10.0  10.0
4        B  2019   5.0  15.0  10.0
5        B  2018   NaN   NaN   NaN
0        A  2020  10.0  10.0  10.0
1        A  2019   5.0  15.0  10.0
2        A  2018   NaN   NaN   NaN
  • Related