I have this code
df = pd.DataFrame({'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'}, 'a': {0: 1.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 3.0, 5: 2.0, 6: 3.0}, 'nv1': {0: [-1.0], 1: [-1.0], 2: [], 3: [], 4: [-2.0], 5: [-2.0, -1.0, -3.0, -1.0], 6: [-2.0, -1.0, -2.0, -1.0]}})
yielding the following dataframe:
R a nv1
0 1 1.0 [-1.0]
1 2 1.0 [-1.0]
2 3 2.0 []
3 4 3.0 []
4 5 3.0 [-2.0]
5 6 2.0 [-2.0, -1.0, -3.0, -1.0]
6 7 3.0 [-2.0, -1.0, -2.0, -1.0]
I need to calculate median of df['nv1']
df['med'] = median of df['nv1']
Desired output as follows
R a nv1 med
1 1.0 [-1.0] -1
2 1.0 [-1.0] -1
3 2.0 []
4 3.0 []
5 3.0 [-2.0] -2
6 2.0 [-2.0, -1.0, -3.0, -1.0] -1.5
7 3.0 [-2.0, -1.0, -2.0, -1.0] -1.5
I tried both line of codes below independently, but I ran into errors:
df['nv1'] = pd.to_numeric(df['nv1'],errors = 'coerce')
df['med'] = df['nv1'].median()
CodePudding user response:
Use np.median
:
df['med'] = df['nv1'].apply(np.median)
Output:
>>> df
R a nv1 med
0 1 1.0 [-1.0] -1.0
1 2 1.0 [-1.0] -1.0
2 3 2.0 [] NaN
3 4 3.0 [] NaN
4 5 3.0 [-2.0] -2.0
5 6 2.0 [-2.0, -1.0, -3.0, -1.0] -1.5
6 7 3.0 [-2.0, -1.0, -2.0, -1.0] -1.5
Or:
df['med'] = df['nv1'].explode().dropna().groupby(level=0).median()