Home > Mobile >  Is there a Python function to calculate the diff based on the first element?
Is there a Python function to calculate the diff based on the first element?

Time:05-17

I have a pandas dataframe, like this example:

df = pd.DataFrame({
    'steps': ['step1','step2', 'step3', 'step4','step5'],
    'qty': [100, 95, 92, 87, 78]},
    index=[0,1,2,3,4])

I would like to calculate the percentage of abandonments in each step, based on the first value.

Output:

Steps   qty Tx %
Step1   100 0,00%
Step2   95  5,00%
Step3   92  3,00%
Step4   87  5,00%
Step5   78  9,00%

I thought about using pd.pct_change(), but it doesn't work as expected.

The manual calculation would be something like:


values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
          (1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
          (1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]

However, I believe that this is not scalable, especially considering increasing the number of steps or time periods.

Could someone help me think of some function or show a rationale that can make this calculation simpler?

CodePudding user response:

You can use pd.Series.shift to shift the qty column one element down. Then simply calculate the difference between the shifted column and itself:

import pandas as pd

df = pd.DataFrame({
    'steps': ['step1','step2', 'step3', 'step4','step5'],
    'qty': [100, 95, 92, 87, 78],
})

df['Tx %'] = df.qty.shift() - df.qty
print(df)

# output:
#    steps  qty  Tx %
# 0  step1  100   NaN
# 1  step2   95   5.0
# 2  step3   92   3.0
# 3  step4   87   5.0
# 4  step5   78   9.0

CodePudding user response:

This works:

# you have
values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
          (1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
          (1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]

# 1s cancel out, so the above is equivalent to
values = [( - df['qty'][0]/df['qty'][0])   (df['qty'][0]/df['qty'][0]),
          ( - df['qty'][1]/df['qty'][0])   (df['qty'][0]/df['qty'][0]),
          ( - df['qty'][2]/df['qty'][0])   (df['qty'][1]/df['qty'][0]),
          ( - df['qty'][3]/df['qty'][0])   (df['qty'][2]/df['qty'][0]),
          ( - df['qty'][4]/df['qty'][0])   (df['qty'][3]/df['qty'][0])]

# since every element is divided by df['qty'][0], you can take it out, so the above is equivalent to
values = [( - df['qty'][0]   df['qty'][0]) / df['qty'][0],
          ( - df['qty'][1]   df['qty'][0]) / df['qty'][0],
          ( - df['qty'][2]   df['qty'][1]) / df['qty'][0],
          ( - df['qty'][3]   df['qty'][2]) / df['qty'][0],
          ( - df['qty'][4]   df['qty'][3]) / df['qty'][0]]

#the RHS is the LHS shifted one level down, so it's equivalent to:
values = (- df['qty']   df['qty'].shift()).fillna(0) / df['qty'][0]

# in summary
df['Tx %'] = (df['qty'].shift() - df['qty']).fillna(0) / df['qty'][0] * 100
   steps  qty  Tx %
0  step1  100   0.0
1  step2   95   5.0
2  step3   92   3.0
3  step4   87   5.0
4  step5   78   9.0

CodePudding user response:

This response my doubt:

df['Tx %'] = round(((1 - df.qty / df.qty[0]) - (1- df.qty.shift() / df.qty[0]))*100, 2)

Thanks to @jfaccioni

  • Related