Count ratios conditional on 2 columns-CodePudding

I am new to pandas and trying to figure out the following how to calculate the percentage change (difference) between 2 years, given that sometimes there is no previous year.

I am given a dataframe as follows:

                company                   date                 amount
1           Company 1                    2020                 3
2           Company 1                    2021                 1
3           COMPANY2                     2020                 7
4           Company 3                    2020                 4
5           Company 3                    2021                 4
..                         ...            ...                          ...
766         Company N                    2021                 9
765         Company N                    2020                 1
767         Company XYZ                  2021                 3
768         Company X                    2021                 3
769         Company Z                    2020                 2

I wrote something like this:

for company in unique(df2.company): 
    company_df = df2[df2.company== company]
    company_df.sort_values(by ="date")
    company_df_year = company_df.amount.tolist()
    company_df_year.pop()
    company_df_year.insert(0,0)
    company_df["value_year_before"] = company_df_year
    if any in company_df.value_year_before == None: 
        company_df["diff"] = 0
    else:       
        company_df["diff"] = (company_df.amount- company_df.value_year_before)/company_df.value_year_before

df2["ratio"] = company_df["diff"]

But I keep getting >NAN.

Where did I make a mistake?

CodePudding user response：

The main issue is that you are overwriting company_df in each iteration of the loop and only keeping the last one.

However, normally when using Pandas if you are starting to use a for loop then you are doing something wrong and there is an easier way to accomplish the goal. Here you could use

Output: