Home > Enterprise >  Can I get the median of the values accross multiple rows in pandas dataframe?
Can I get the median of the values accross multiple rows in pandas dataframe?

Time:02-16

I have this DataFrame

      Date          vix2
0  2010-01  [14, 15, 16]
1  2010-02  [19, 20, 22]
2  2010-03  [28, 40, 60]
3  2010-04  [36, 24, 30]
4  2010-05  [34, 25, 22]
5  2010-06  [24, 29, 30]

and I would like to get the median of N rows, where N here is 2, so I would like the result to be something like

      Date  median
0  2010-01     NaN
1  2010-02    17.5
2  2010-03    25.0
3  2010-04    33.0
4  2010-05    27.5
5  2010-06    27.0

This line of code seems to work but then I can't seem to get past the error

np.median(result['vix2'].values, axis=0)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()```

CodePudding user response:

  1. Combine values for every N (=2) rows into a single row.
  2. Use explode and groupby
df["values"] = df["vix2"] df["vix2"].shift()
output = df.explode("values").groupby("Date")["values"].median().reset_index()

>>> output
      Date  values
0  2010-01     NaN
1  2010-02    17.5
2  2010-03    25.0
3  2010-04    33.0
4  2010-05    27.5
5  2010-06    27.0

CodePudding user response:

If you only need rolling number with 2, we can do shift

df['new'] = pd.DataFrame((df['vix2']   df['vix2'].shift(1)).fillna({0:[]}).tolist()).median(1)
Out[98]: 
0     NaN
1    17.5
2    25.0
3    33.0
4    27.5
5    27.0
dtype: float64

More comment way to deal with object type rolling

n = 2
df.loc[n-1:,'new'] = [np.median(df['vix2'].iloc[x:x n].sum()) for x in df.index][:-1]
df
Out[121]: 
           vix2   new
0  [14, 15, 16]   NaN
1  [19, 20, 22]  17.5
2  [28, 40, 60]  25.0
3  [36, 24, 30]  33.0
4  [34, 25, 22]  27.5
5  [24, 29, 30]  27.0
  • Related