Home > Enterprise >  How to create bin based on quantile() in Python
How to create bin based on quantile() in Python

Time:05-04

I want to create a function which can help me to create a Bin based on my quantile values i.e. if it is less than 0.33 then Low, between 0.33 to 0.66 then Medium else High

I have few 100 of columns in my data frame and want to perform this for some column out of 100 hence need to use function.

I have written below code:

dict_1 = {'Col1' : [10, 20, 30, 40, 50, 60, 70, 80, 90]}
test_df_1 = pd.DataFrame(dict_1)
print(test_df_1.quantile([0.33, 0.66, 1]))

value = []
def quantile_bin(x):
    value.append(np.where(x <= x.quantile(0.33), "Low", "NA"))
    value.append(np.where((x > x.quantile(0.33)) & (x <=x.quantile(0.66)) , "Medium", "NA"))
    value.append(np.where(x > x.quantile(0.66), "High", "NA"))

print(value)

OUTPUT ==
[array(['Low', 'Low', 'Low', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA'], dtype='<U3'),
array(['NA', 'NA', 'NA', 'High', 'High', 'High', 'NA', 'NA', 'NA'], dtype='<U4'),
array(['NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'Medium', 'Medium', 'Medium'], dtype='<U6')]

Above code is not working. Is there any way I can enter multiple condition with there output inside a function?

CodePudding user response:

Method pd.qcut() does exactly what you need:

df = pd.DataFrame(np.random.normal(size=100), columns=('x',))    
labels = 'Low', 'Medium', 'High'
df['labels'] = pd.qcut(df['x'], len(labels), labels=labels)

CodePudding user response:

Use qcut and because need processing 100 columns use DataFrame.apply with lambda function if need processing each column separately:

np.random.seed(2022)

df = pd.DataFrame(np.random.randint(20, size=(20, 5)))
print (df)
     0   1   2   3   4
0   13  16  17  17  16
1   18  16   9   1  11
2   19   8  11  16  19
3    6   8  14  16  11
4    1   2  16  15   5
5   17   5  18  19  12
6   13  17  12  14   0
7   13  13  10   0  13
8    5  11  11  14   2
9    0   4  14   9   8
10   8   0  19   2   1
11  15  18  17   9   6
12  15   6   7   2  13
13   1  10   9   4   0
14  12   4  14   8   1
15  18  10  12   0   8
16   3   4  19  17  18
17   2   7   3  12   3
18  11  14  13   3  12
19  19   1   3  10  11

df = df.apply(lambda x: pd.qcut(x, 3, labels=['Low','Medium','High']))
print (df)
         0       1       2       3       4
0   Medium    High    High    High    High
1     High    High     Low     Low  Medium
2     High  Medium     Low    High    High
3      Low  Medium  Medium    High  Medium
4      Low     Low    High    High     Low
5     High     Low    High    High    High
6   Medium    High  Medium  Medium     Low
7   Medium    High     Low     Low    High
8      Low    High     Low  Medium     Low
9      Low     Low  Medium  Medium  Medium
10  Medium     Low    High     Low     Low
11    High    High    High  Medium  Medium
12    High  Medium     Low     Low    High
13     Low  Medium     Low     Low     Low
14  Medium     Low  Medium  Medium     Low
15    High  Medium  Medium     Low  Medium
16     Low     Low    High    High    High
17     Low  Medium     Low  Medium     Low
18  Medium    High  Medium     Low    High
19    High     Low     Low  Medium  Medium

Another idea if need processing all columns together is use DataFrame.stack for MultiIndex Series, pass to qcut and last use Series.unstack (output is different):

df = pd.qcut(df.stack(), 3, labels=['Low','Medium','High']).unstack()
print (df)
         0       1       2       3       4
0   Medium    High    High    High    High
1     High    High  Medium     Low  Medium
2     High  Medium  Medium    High    High
3      Low  Medium    High    High  Medium
4      Low     Low    High    High     Low
5     High     Low    High    High  Medium
6   Medium    High  Medium    High     Low
7   Medium  Medium  Medium     Low  Medium
8      Low  Medium  Medium    High     Low
9      Low     Low    High  Medium  Medium
10  Medium     Low    High     Low     Low
11    High    High    High  Medium     Low
12    High     Low     Low     Low  Medium
13     Low  Medium  Medium     Low     Low
14  Medium     Low    High  Medium     Low
15    High  Medium  Medium     Low  Medium
16     Low     Low    High    High    High
17     Low     Low     Low  Medium     Low
18  Medium    High  Medium     Low  Medium
19    High     Low     Low  Medium  Medium
  • Related