Home > Net >  How to compare values from previous row for % difference
How to compare values from previous row for % difference

Time:03-01

Solving with Pandas

I have this dataset this is the question: For a user selected type of public transport type, display the year and the average trip distance in which the average has declined by at least 5% over the previous year.

     Year  MRT  LRT  Bus  Taxi

0   2004  11.5  0.0  5.2   8.5
1   2005  11.5  0.0  5.4   8.8
2   2006  11.3  0.0  5.3   9.1
3   2007  11.2  0.0  5.3   9.7
4   2008  11.2  2.1  5.3   9.0
5   2009  10.8  2.1  5.1   9.0
6   2010  10.3  2.1  4.8   9.3
7   2011  10.0  2.0  4.5   9.6
8   2012   9.6  1.0  4.4   9.7
9   2013   9.5  2.0  4.3   9.8
10  2014   9.2  2.0  4.3  10.0

eg. If i choose "Bus": For easier visualization of that data column:

    Year  Bus
0   2004  5.2
1   2005  5.4
2   2006  5.3
3   2007  5.3
4   2008  5.3
5   2009  5.1
6   2010  4.8
7   2011  4.5
8   2012  4.4
9   2013  4.3
10  2014  4.3

it will compare 2005 data vs 2004 data if it has decreased more than 5%, if yes print it out, if no, carry on scanning.

expected output: "In Year 2010, data is 4.8 and has declined more than 5% compared to the previous year" (note: 5.1 * 0.95 = 4.85)

"In year 2011, data is 4.5 and has declined more than 5% compared to its previous year" (note: 4.8 * 0.95 = 4.56)

I was thinking of a for loop but I'm not really sure about how to use to it compare for 5% decrement.

CodePudding user response:

You could add a column (per transport way) taking the value of your condition:

df['hasDecreasedMarkedly'] = (df['Bus'] - df['Bus'].shift(1))/df['Bus'] >= 0.05

CodePudding user response:

You can use pandas .pct_change().

import pandas as pd

df = pd.DataFrame({'Year':[2004, 2005, 2006, 2007],
                   'Bus':[5.2, 5.1, 2, 2.1]})

df['pct_change']= df['Bus'].pct_change()

df['5 pct change'] = df['pct_change'] < -0.05

Output:

    Year    Bus   pct_change       5 pct change
0   2004    5.2    NaN             False
1   2005    5.1   -0.019231        False
2   2006    2.0   -0.607843        True
3   2007    2.1    0.050000        False

To print out your request, "if it has decreased more than 5%, if yes print it out, if no, carry on scanning" you can do this.

(print(df.loc[df['5 pct change'] == True]))

Which prints this:

   Year  Bus  pct_change  5 pct change
2  2006  2.0   -0.607843          True
  • Related