I have a pandas dataframe, like this example:
df = pd.DataFrame({
'steps': ['step1','step2', 'step3', 'step4','step5'],
'qty': [100, 95, 92, 87, 78]},
index=[0,1,2,3,4])
I would like to calculate the percentage of abandonments in each step, based on the first value.
Output:
Steps qty Tx %
Step1 100 0,00%
Step2 95 5,00%
Step3 92 3,00%
Step4 87 5,00%
Step5 78 9,00%
I thought about using pd.pct_change()
, but it doesn't work as expected.
The manual calculation would be something like:
values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
(1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
(1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
(1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
(1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]
However, I believe that this is not scalable, especially considering increasing the number of steps or time periods.
Could someone help me think of some function or show a rationale that can make this calculation simpler?
CodePudding user response:
You can use pd.Series.shift
to shift the qty
column one element down. Then simply calculate the difference between the shifted column and itself:
import pandas as pd
df = pd.DataFrame({
'steps': ['step1','step2', 'step3', 'step4','step5'],
'qty': [100, 95, 92, 87, 78],
})
df['Tx %'] = df.qty.shift() - df.qty
print(df)
# output:
# steps qty Tx %
# 0 step1 100 NaN
# 1 step2 95 5.0
# 2 step3 92 3.0
# 3 step4 87 5.0
# 4 step5 78 9.0
CodePudding user response:
This works:
# you have
values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
(1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
(1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
(1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
(1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]
# 1s cancel out, so the above is equivalent to
values = [( - df['qty'][0]/df['qty'][0]) (df['qty'][0]/df['qty'][0]),
( - df['qty'][1]/df['qty'][0]) (df['qty'][0]/df['qty'][0]),
( - df['qty'][2]/df['qty'][0]) (df['qty'][1]/df['qty'][0]),
( - df['qty'][3]/df['qty'][0]) (df['qty'][2]/df['qty'][0]),
( - df['qty'][4]/df['qty'][0]) (df['qty'][3]/df['qty'][0])]
# since every element is divided by df['qty'][0], you can take it out, so the above is equivalent to
values = [( - df['qty'][0] df['qty'][0]) / df['qty'][0],
( - df['qty'][1] df['qty'][0]) / df['qty'][0],
( - df['qty'][2] df['qty'][1]) / df['qty'][0],
( - df['qty'][3] df['qty'][2]) / df['qty'][0],
( - df['qty'][4] df['qty'][3]) / df['qty'][0]]
#the RHS is the LHS shifted one level down, so it's equivalent to:
values = (- df['qty'] df['qty'].shift()).fillna(0) / df['qty'][0]
# in summary
df['Tx %'] = (df['qty'].shift() - df['qty']).fillna(0) / df['qty'][0] * 100
steps qty Tx %
0 step1 100 0.0
1 step2 95 5.0
2 step3 92 3.0
3 step4 87 5.0
4 step5 78 9.0
CodePudding user response:
This response my doubt:
df['Tx %'] = round(((1 - df.qty / df.qty[0]) - (1- df.qty.shift() / df.qty[0]))*100, 2)
Thanks to @jfaccioni