Home > Blockchain >  Apply a formula to a panda Dataframe
Apply a formula to a panda Dataframe

Time:04-26

I have a very simple question. I have a dataframe like this

In [19]: df = DataFrame(randn(10,2),columns=list('A'))

In [20]: df

Out[20]: 
          A  
0  0.958465  
1 -0.769077  
2  0.598059  
3  0.290926 
4 -0.248910 
5 -1.352096 
6  0.009125
7 -0.993082
8 -0.593704
9  0.523332

I would like to create a new column B with the following information:

          A              B
0  0.958465  
1 -0.769077  A1*A1 2*A0*A2
2  0.598059  A2*A2 2*A1*A3
3  0.290926  A3*A3 2*A2*A4
4 -0.248910  A4*A4 2*A3*A5
5 -1.352096  ...
6  0.009125  ...
7 -0.993082  ...
8 -0.593704  ...
9  0.523332  ...

It is a sort of convolution or autocorrelation but using everytime a different window. How can I define such a formula in Pandas?

Second question: how can I make variable the number of points involved in the formula (in the example I am just using the previous and the next point to make the calculation, but how can I pass a variable to say to pandas the number of points I want to use for the calculation)?

CodePudding user response:

df['B'] df['A']**2   2 * df['A'].shift() * df['A'].shift(-1)
df
          A         B
0  0.958465       NaN
1 -0.769077  1.737917
2  0.598059 -0.089814
3  0.290926 -0.213088
4 -0.248910 -0.724764
5 -1.352096  1.823621
6  0.009125  2.685568
7 -0.993082  0.975377
8 -0.593704 -0.686939
9  0.523332       NaN

CodePudding user response:

You can make a function like this to allow a variable number of lags.

def func(s, lags=1):
    return s ** 2   sum(s.shift(lag) * s.shift(-lag) for lag in range(1, lags 1))

df = pd.DataFrame({"A": [0.958465, -0.769077, 0.598059, 0.290926, -0.248910, -1.352096, 0.009125, 0.993082, -0.593704, 0.523332]})
df["B"] = func(df["A"], 1) # takes 1 point on either side
df["C"] = func(df["A"], 2) # takes 2 points on either side
  • Related