I have a very simple question. I have a dataframe like this
In [19]: df = DataFrame(randn(10,2),columns=list('A'))
In [20]: df
Out[20]:
A
0 0.958465
1 -0.769077
2 0.598059
3 0.290926
4 -0.248910
5 -1.352096
6 0.009125
7 -0.993082
8 -0.593704
9 0.523332
I would like to create a new column B with the following information:
A B
0 0.958465
1 -0.769077 A1*A1 2*A0*A2
2 0.598059 A2*A2 2*A1*A3
3 0.290926 A3*A3 2*A2*A4
4 -0.248910 A4*A4 2*A3*A5
5 -1.352096 ...
6 0.009125 ...
7 -0.993082 ...
8 -0.593704 ...
9 0.523332 ...
It is a sort of convolution or autocorrelation but using everytime a different window. How can I define such a formula in Pandas?
Second question: how can I make variable the number of points involved in the formula (in the example I am just using the previous and the next point to make the calculation, but how can I pass a variable to say to pandas the number of points I want to use for the calculation)?
CodePudding user response:
df['B'] df['A']**2 2 * df['A'].shift() * df['A'].shift(-1)
df
A B
0 0.958465 NaN
1 -0.769077 1.737917
2 0.598059 -0.089814
3 0.290926 -0.213088
4 -0.248910 -0.724764
5 -1.352096 1.823621
6 0.009125 2.685568
7 -0.993082 0.975377
8 -0.593704 -0.686939
9 0.523332 NaN
CodePudding user response:
You can make a function like this to allow a variable number of lags.
def func(s, lags=1):
return s ** 2 sum(s.shift(lag) * s.shift(-lag) for lag in range(1, lags 1))
df = pd.DataFrame({"A": [0.958465, -0.769077, 0.598059, 0.290926, -0.248910, -1.352096, 0.009125, 0.993082, -0.593704, 0.523332]})
df["B"] = func(df["A"], 1) # takes 1 point on either side
df["C"] = func(df["A"], 2) # takes 2 points on either side