I have a dataframe listing revenue by company and year. See below:
Company | Acc_Name |Date | Value
A2M Sales/Revenue 2016 167770000.0
A2M Sales/Revenue 2017 360842000.0
A2M Sales/Revenue 2018 68087000.0
A2M Sales/Revenue 2019 963000000.0
A2M Sales/Revenue 2020 143346000.0
In python I want to create a new column showing the difference year on year, so 2017 will show the variance between 2017 & 2016. I'm wanting to run this on a large dataframe with multiple companies.
CodePudding user response:
Here is my solution which creates a new column with previous year data and then simply takes the differences of them:
df["prev_val"] = df["Value"].shift(1) # creates new column with previous year data
df["Difference"] = df["Value"] - df["prev_val"]
Since you are willing to do this on several companies, make sure that you filter out other companies by
this_company_df = df[df["Company"] == "A2M"]
and order data in ascending order by
this_company_df = this_company_df.sort_values(by=["Date"], ascending=True)
So, the final code code should look something like this:
this_company_df = df[df["Company"] == "A2M"]
this_company_df = this_company_df.sort_values(by=["Date"], ascending=True)
this_company_df["prev_val"] = this_company_df["Value"].shift(1)
this_company_df["Difference"] = this_company_df["Value"] - this_company_df["prev_val"]
So, the result is stored in "Difference"
column. One more thing you could improve is to take care of initial year by setting it to 0.
CodePudding user response:
revenues['Revenue_Change'] = revenues['Value'].diff(periods=1)
Is the simplest way to do it. However, since your dataframe contains data for multiple companies, you can use this:
revenues['Revenue_Change'] = revenues.groupby('Company',sort=False)['Value'].diff(periods=1)
This sets the first entry for each company in the set to NAN.
If, by any chance, the dataframe is not in order, you can use
revenues = revenues.sort_values('Company')
Groupby will correctly calculate YoY revenue change, even if entries are separated from one another, as long as the actual revenues are chronologically in order for each company.
EDIT:
If everything is out of order, then sort by the year, groupby and then sort by company name:
revenues = revenues.sort_values('Date')
revenues['Revenue_Change'] = revenues.groupby('Company',sort=False)['Value'].diff()
revenues = revenues.sort_values('Company')