Home > other >  Apply binner to another columns in pandas
Apply binner to another columns in pandas

Time:09-06

I have created the following pandas dataframe called df:

import pandas as pd
import numpy as np

ds = {'degreeCentraloty':[1,2,3,4,5,6,7,8,9,10], 'col2' :['Email','Email','Email','Email','Email','Email','Other','Other','Other','Other']}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)
   degreeCentraloty   col2
0                 1  Email
1                 2  Email
2                 3  Email
3                 4  Email
4                 5  Email
5                 6  Email
6                 7  Other
7                 8  Other
8                 9  Other
9                10  Other

I have then taken a subset of the df dataframe by selecting only the rows for which col2 = "Email":

data = df.loc[df['col2'] == 'Email']

   degreeCentraloty   col2
0                 1  Email
1                 2  Email
2                 3  Email
3                 4  Email
4                 5  Email
5                 6  Email

Then I have binned the field called degreeCentraloty like this:

data['dg_binned'] = pd.qcut(data['degreeCentraloty'], q = 2)
print(data)

   degreeCentraloty   col2     dg_binned
0                 1  Email  (0.999, 3.5]
1                 2  Email  (0.999, 3.5]
2                 3  Email  (0.999, 3.5]
3                 4  Email    (3.5, 6.0]
4                 5  Email    (3.5, 6.0]
5                 6  Email    (3.5, 6.0]

I need to convert the field dg_binned inot a list that I can use as binner. So from this:

   dg_binned
(0.999, 3.5]
(0.999, 3.5]
(0.999, 3.5]
  (3.5, 6.0]
  (3.5, 6.0]
  (3.5, 6.0]

I need to get this:

[3.5,6]

Does anybody know how do it in pandas?

CodePudding user response:

IIUC use:

i = pd.IntervalIndex(data['dg_binned'])
print(i)
IntervalIndex([(0.999, 3.5], (0.999, 3.5], (0.999, 3.5],(3.5, 6.0], (3.5, 6.0], (3.5, 6.0]],
              closed='right',
              name='dg_binned',
              dtype='interval[float64]')

L = list(map(list, zip(i.left, i.right)))
print(L)
[[0.999, 3.5], [0.999, 3.5], [0.999, 3.5], [3.5, 6.0], [3.5, 6.0], [3.5, 6.0]]

Or:

L = [[i.left, i.right] for i in data['dg_binned']]
print(L)
[[0.999, 3.5], [0.999, 3.5], [0.999, 3.5], [3.5, 6.0], [3.5, 6.0], [3.5, 6.0]]
  • Related