I have a table like this
animal data 2000 2001 2002
0 dog red 1.0 1.5 2.625
1 dog blue 3.0 4.5 7.875
2 cat red 5.0 7.5 13.125
3 cat blue 4.0 6.0 10.500
4 bird red NaN NaN NaN
5 bird blue 6.0 9.0 15.750
randomNumbers = np.array([1, 3, 5, 4, np.NaN, 6])
data = {
"animal": ["dog", "dog", "cat", "cat", "bird", "bird"],
"data": ["red", "blue", "red", "blue", "red", "blue"],
2000: randomNumbers,
2001: randomNumbers * 1.5,
2002: randomNumbers * 1.5 * 1.75,
}
df = pd.DataFrame.from_dict(data)
I want to calculate the percentage difference year by year, for each unique 'animal', with data='red', and add it to the dataframe.
I expect a result like this
animal data 2000 2001 2002
0 dog red 1.0 1.5 2.625
1 dog blue 3.0 4.5 7.875
2 cat red 5.0 7.5 13.125
3 cat blue 4.0 6.0 10.500
4 bird red NaN NaN NaN
5 bird blue 6.0 9.0 15.750
6 dog redPercent NaN 0.5 0.750
7 cat redPercent NaN 0.5 0.750
8 bird redPercent NaN NaN NaN
So I did this:
for animal in df["animal"].unique():
animalRow = df[(df["animal"] == animal) & (df["data"] == "red")]
percChange = animalRow.loc[:, 2000:2002].pct_change(axis=1)
newRow = [animal, "redPercent"] percChange.values.tolist()[0]
df.loc[len(df)]=newRow
print(df)
But it looks non pythonic. There is a better way?
CodePudding user response:
Use pct_change
after slicing and setting non year column aside temporarily, then concat
:
df = pd.concat(
[df,
(df[df['data'].eq('red')]
.assign(data=lambda d: d['data'].add('Percent'))
.set_index(['animal', 'data'])
.pct_change(axis=1)
.reset_index()
)], ignore_index=True)
Output:
animal data 2000 2001 2002
0 dog red 1.0 1.5 2.625
1 dog blue 3.0 4.5 7.875
2 cat red 5.0 7.5 13.125
3 cat blue 4.0 6.0 10.500
4 bird red NaN NaN NaN
5 bird blue 6.0 9.0 15.750
6 dog redPercent NaN 0.5 0.750
7 cat redPercent NaN 0.5 0.750
8 bird redPercent NaN NaN NaN