I have this DataFrame
Date vix2
0 2010-01 [14, 15, 16]
1 2010-02 [19, 20, 22]
2 2010-03 [28, 40, 60]
3 2010-04 [36, 24, 30]
4 2010-05 [34, 25, 22]
5 2010-06 [24, 29, 30]
and I would like to get the median of N rows, where N here is 2, so I would like the result to be something like
Date median
0 2010-01 NaN
1 2010-02 17.5
2 2010-03 25.0
3 2010-04 33.0
4 2010-05 27.5
5 2010-06 27.0
This line of code seems to work but then I can't seem to get past the error
np.median(result['vix2'].values, axis=0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()```
CodePudding user response:
- Combine values for every N (=2) rows into a single row.
- Use
explode
andgroupby
df["values"] = df["vix2"] df["vix2"].shift()
output = df.explode("values").groupby("Date")["values"].median().reset_index()
>>> output
Date values
0 2010-01 NaN
1 2010-02 17.5
2 2010-03 25.0
3 2010-04 33.0
4 2010-05 27.5
5 2010-06 27.0
CodePudding user response:
If you only need rolling
number with 2, we can do shift
df['new'] = pd.DataFrame((df['vix2'] df['vix2'].shift(1)).fillna({0:[]}).tolist()).median(1)
Out[98]:
0 NaN
1 17.5
2 25.0
3 33.0
4 27.5
5 27.0
dtype: float64
More comment way to deal with object
type rolling
n = 2
df.loc[n-1:,'new'] = [np.median(df['vix2'].iloc[x:x n].sum()) for x in df.index][:-1]
df
Out[121]:
vix2 new
0 [14, 15, 16] NaN
1 [19, 20, 22] 17.5
2 [28, 40, 60] 25.0
3 [36, 24, 30] 33.0
4 [34, 25, 22] 27.5
5 [24, 29, 30] 27.0