Solving with Pandas
I have this dataset this is the question: For a user selected type of public transport type, display the year and the average trip distance in which the average has declined by at least 5% over the previous year.
Year MRT LRT Bus Taxi
0 2004 11.5 0.0 5.2 8.5
1 2005 11.5 0.0 5.4 8.8
2 2006 11.3 0.0 5.3 9.1
3 2007 11.2 0.0 5.3 9.7
4 2008 11.2 2.1 5.3 9.0
5 2009 10.8 2.1 5.1 9.0
6 2010 10.3 2.1 4.8 9.3
7 2011 10.0 2.0 4.5 9.6
8 2012 9.6 1.0 4.4 9.7
9 2013 9.5 2.0 4.3 9.8
10 2014 9.2 2.0 4.3 10.0
eg. If i choose "Bus": For easier visualization of that data column:
Year Bus
0 2004 5.2
1 2005 5.4
2 2006 5.3
3 2007 5.3
4 2008 5.3
5 2009 5.1
6 2010 4.8
7 2011 4.5
8 2012 4.4
9 2013 4.3
10 2014 4.3
it will compare 2005 data vs 2004 data if it has decreased more than 5%, if yes print it out, if no, carry on scanning.
expected output: "In Year 2010, data is 4.8 and has declined more than 5% compared to the previous year" (note: 5.1 * 0.95 = 4.85)
"In year 2011, data is 4.5 and has declined more than 5% compared to its previous year" (note: 4.8 * 0.95 = 4.56)
I was thinking of a for loop but I'm not really sure about how to use to it compare for 5% decrement.
CodePudding user response:
You could add a column (per transport way) taking the value of your condition:
df['hasDecreasedMarkedly'] = (df['Bus'] - df['Bus'].shift(1))/df['Bus'] >= 0.05
CodePudding user response:
You can use pandas .pct_change().
import pandas as pd
df = pd.DataFrame({'Year':[2004, 2005, 2006, 2007],
'Bus':[5.2, 5.1, 2, 2.1]})
df['pct_change']= df['Bus'].pct_change()
df['5 pct change'] = df['pct_change'] < -0.05
Output:
Year Bus pct_change 5 pct change
0 2004 5.2 NaN False
1 2005 5.1 -0.019231 False
2 2006 2.0 -0.607843 True
3 2007 2.1 0.050000 False
To print out your request, "if it has decreased more than 5%, if yes print it out, if no, carry on scanning" you can do this.
(print(df.loc[df['5 pct change'] == True]))
Which prints this:
Year Bus pct_change 5 pct change
2 2006 2.0 -0.607843 True