Home > Enterprise >  Python: calculating difference between rows
Python: calculating difference between rows

Time:11-23

I have a dataframe listing revenue by company and year. See below:

           Company |      Acc_Name  |Date       | Value
               A2M  Sales/Revenue  2016  167770000.0
               A2M  Sales/Revenue  2017  360842000.0
               A2M  Sales/Revenue  2018   68087000.0
               A2M  Sales/Revenue  2019  963000000.0
               A2M  Sales/Revenue  2020  143346000.0

In python I want to create a new column showing the difference year on year, so 2017 will show the variance between 2017 & 2016. I'm wanting to run this on a large dataframe with multiple companies.

CodePudding user response:

Here is my solution which creates a new column with previous year data and then simply takes the differences of them:

df["prev_val"] = df["Value"].shift(1) # creates new column with previous year data
df["Difference"] = df["Value"] - df["prev_val"]

Since you are willing to do this on several companies, make sure that you filter out other companies by

this_company_df = df[df["Company"] == "A2M"]

and order data in ascending order by

this_company_df = this_company_df.sort_values(by=["Date"], ascending=True)

So, the final code code should look something like this:

this_company_df = df[df["Company"] == "A2M"] 
this_company_df = this_company_df.sort_values(by=["Date"], ascending=True)
this_company_df["prev_val"] = this_company_df["Value"].shift(1) 
this_company_df["Difference"] = this_company_df["Value"] - this_company_df["prev_val"]

So, the result is stored in "Difference" column. One more thing you could improve is to take care of initial year by setting it to 0.

CodePudding user response:

revenues['Revenue_Change'] = revenues['Value'].diff(periods=1)

Is the simplest way to do it. However, since your dataframe contains data for multiple companies, you can use this:

revenues['Revenue_Change'] = revenues.groupby('Company',sort=False)['Value'].diff(periods=1)

This sets the first entry for each company in the set to NAN.

If, by any chance, the dataframe is not in order, you can use

revenues = revenues.sort_values('Company')

Groupby will correctly calculate YoY revenue change, even if entries are separated from one another, as long as the actual revenues are chronologically in order for each company.

EDIT:

If everything is out of order, then sort by the year, groupby and then sort by company name:

revenues = revenues.sort_values('Date')
revenues['Revenue_Change'] = revenues.groupby('Company',sort=False)['Value'].diff()    
revenues = revenues.sort_values('Company')
  • Related