I have created the following pandas dataframe called df
:
import pandas as pd
import numpy as np
ds = {'degreeCentraloty':[1,2,3,4,5,6,7,8,9,10], 'col2' :['Email','Email','Email','Email','Email','Email','Other','Other','Other','Other']}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
degreeCentraloty col2
0 1 Email
1 2 Email
2 3 Email
3 4 Email
4 5 Email
5 6 Email
6 7 Other
7 8 Other
8 9 Other
9 10 Other
I have then taken a subset of the df
dataframe by selecting only the rows for which col2
= "Email"
:
data = df.loc[df['col2'] == 'Email']
degreeCentraloty col2
0 1 Email
1 2 Email
2 3 Email
3 4 Email
4 5 Email
5 6 Email
Then I have binned the field called degreeCentraloty like this:
data['dg_binned'] = pd.qcut(data['degreeCentraloty'], q = 2)
print(data)
degreeCentraloty col2 dg_binned
0 1 Email (0.999, 3.5]
1 2 Email (0.999, 3.5]
2 3 Email (0.999, 3.5]
3 4 Email (3.5, 6.0]
4 5 Email (3.5, 6.0]
5 6 Email (3.5, 6.0]
I need to convert the field dg_binned inot a list that I can use as binner. So from this:
dg_binned
(0.999, 3.5]
(0.999, 3.5]
(0.999, 3.5]
(3.5, 6.0]
(3.5, 6.0]
(3.5, 6.0]
I need to get this:
[3.5,6]
Does anybody know how do it in pandas?
CodePudding user response:
IIUC use:
i = pd.IntervalIndex(data['dg_binned'])
print(i)
IntervalIndex([(0.999, 3.5], (0.999, 3.5], (0.999, 3.5],(3.5, 6.0], (3.5, 6.0], (3.5, 6.0]],
closed='right',
name='dg_binned',
dtype='interval[float64]')
L = list(map(list, zip(i.left, i.right)))
print(L)
[[0.999, 3.5], [0.999, 3.5], [0.999, 3.5], [3.5, 6.0], [3.5, 6.0], [3.5, 6.0]]
Or:
L = [[i.left, i.right] for i in data['dg_binned']]
print(L)
[[0.999, 3.5], [0.999, 3.5], [0.999, 3.5], [3.5, 6.0], [3.5, 6.0], [3.5, 6.0]]