Home > Net >  Calculate median of column with multiple values per cell (ranges)
Calculate median of column with multiple values per cell (ranges)

Time:03-15

I have this code

df = pd.DataFrame({'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'}, 'a': {0: 1.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 3.0, 5: 2.0, 6: 3.0}, 'nv1': {0: [-1.0], 1: [-1.0], 2: [], 3: [], 4: [-2.0], 5: [-2.0, -1.0, -3.0, -1.0], 6: [-2.0, -1.0, -2.0, -1.0]}})

yielding the following dataframe:

    R    a    nv1
0   1   1.0 [-1.0]
1   2   1.0 [-1.0]
2   3   2.0    []
3   4   3.0    []
4   5   3.0  [-2.0]
5   6   2.0  [-2.0, -1.0, -3.0, -1.0]
6   7   3.0  [-2.0, -1.0, -2.0, -1.0]

I need to calculate median of df['nv1']

df['med'] = median of df['nv1']

Desired output as follows

R    a    nv1                       med
1   1.0 [-1.0]                       -1
2   1.0 [-1.0]                       -1
3   2.0    []
4   3.0    []
5   3.0  [-2.0]                      -2
6   2.0  [-2.0, -1.0, -3.0, -1.0]    -1.5
7   3.0  [-2.0, -1.0, -2.0, -1.0]    -1.5

I tried both line of codes below independently, but I ran into errors:

df['nv1'] = pd.to_numeric(df['nv1'],errors = 'coerce')
df['med'] = df['nv1'].median()

CodePudding user response:

Use np.median:

df['med'] = df['nv1'].apply(np.median)

Output:

>>> df
   R    a                       nv1  med
0  1  1.0                    [-1.0] -1.0
1  2  1.0                    [-1.0] -1.0
2  3  2.0                        []  NaN
3  4  3.0                        []  NaN
4  5  3.0                    [-2.0] -2.0
5  6  2.0  [-2.0, -1.0, -3.0, -1.0] -1.5
6  7  3.0  [-2.0, -1.0, -2.0, -1.0] -1.5

Or:

df['med'] = df['nv1'].explode().dropna().groupby(level=0).median()
  • Related