Home > Back-end >  How to calculate percentage change in rows by two column criteria
How to calculate percentage change in rows by two column criteria

Time:12-12

I have a table like this

    animal data  2000  2001    2002
    0    dog   red   1.0   1.5   2.625
    1    dog  blue   3.0   4.5   7.875
    2    cat   red   5.0   7.5  13.125
    3    cat  blue   4.0   6.0  10.500
    4   bird   red   NaN   NaN     NaN
    5   bird  blue   6.0   9.0  15.750
    randomNumbers = np.array([1, 3, 5, 4, np.NaN, 6])

    data = {
        "animal": ["dog", "dog", "cat", "cat", "bird", "bird"],
        "data": ["red", "blue", "red", "blue", "red", "blue"],
        2000: randomNumbers,
        2001: randomNumbers * 1.5,
        2002: randomNumbers * 1.5 * 1.75,
    }
    df = pd.DataFrame.from_dict(data)

I want to calculate the percentage difference year by year, for each unique 'animal', with data='red', and add it to the dataframe.

I expect a result like this

    animal        data  2000  2001    2002
    0    dog         red   1.0   1.5   2.625
    1    dog        blue   3.0   4.5   7.875
    2    cat         red   5.0   7.5  13.125
    3    cat        blue   4.0   6.0  10.500
    4   bird         red   NaN   NaN     NaN
    5   bird        blue   6.0   9.0  15.750
    6    dog  redPercent   NaN   0.5   0.750
    7    cat  redPercent   NaN   0.5   0.750
    8   bird  redPercent   NaN   NaN     NaN

So I did this:

    for animal in df["animal"].unique():
        animalRow = df[(df["animal"] == animal) & (df["data"] == "red")]
        percChange = animalRow.loc[:, 2000:2002].pct_change(axis=1)
        newRow = [animal, "redPercent"]   percChange.values.tolist()[0]

        df.loc[len(df)]=newRow

    print(df)

But it looks non pythonic. There is a better way?

CodePudding user response:

Use pct_change after slicing and setting non year column aside temporarily, then concat:

df = pd.concat(
   [df,
   (df[df['data'].eq('red')]
    .assign(data=lambda d: d['data'].add('Percent'))
   .set_index(['animal', 'data'])
   .pct_change(axis=1)
   .reset_index()
)], ignore_index=True)

Output:


  animal        data  2000  2001    2002
0    dog         red   1.0   1.5   2.625
1    dog        blue   3.0   4.5   7.875
2    cat         red   5.0   7.5  13.125
3    cat        blue   4.0   6.0  10.500
4   bird         red   NaN   NaN     NaN
5   bird        blue   6.0   9.0  15.750
6    dog  redPercent   NaN   0.5   0.750
7    cat  redPercent   NaN   0.5   0.750
8   bird  redPercent   NaN   NaN     NaN
  • Related