I want to create a function which can help me to create a Bin based on my quantile values i.e. if it is less than 0.33 then Low, between 0.33 to 0.66 then Medium else High
I have few 100 of columns in my data frame and want to perform this for some column out of 100 hence need to use function.
I have written below code:
dict_1 = {'Col1' : [10, 20, 30, 40, 50, 60, 70, 80, 90]}
test_df_1 = pd.DataFrame(dict_1)
print(test_df_1.quantile([0.33, 0.66, 1]))
value = []
def quantile_bin(x):
value.append(np.where(x <= x.quantile(0.33), "Low", "NA"))
value.append(np.where((x > x.quantile(0.33)) & (x <=x.quantile(0.66)) , "Medium", "NA"))
value.append(np.where(x > x.quantile(0.66), "High", "NA"))
print(value)
OUTPUT ==
[array(['Low', 'Low', 'Low', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA'], dtype='<U3'),
array(['NA', 'NA', 'NA', 'High', 'High', 'High', 'NA', 'NA', 'NA'], dtype='<U4'),
array(['NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'Medium', 'Medium', 'Medium'], dtype='<U6')]
Above code is not working. Is there any way I can enter multiple condition with there output inside a function?
CodePudding user response:
Method pd.qcut()
does exactly what you need:
df = pd.DataFrame(np.random.normal(size=100), columns=('x',))
labels = 'Low', 'Medium', 'High'
df['labels'] = pd.qcut(df['x'], len(labels), labels=labels)
CodePudding user response:
Use qcut
and because need processing 100 columns use DataFrame.apply
with lambda function if need processing each column separately:
np.random.seed(2022)
df = pd.DataFrame(np.random.randint(20, size=(20, 5)))
print (df)
0 1 2 3 4
0 13 16 17 17 16
1 18 16 9 1 11
2 19 8 11 16 19
3 6 8 14 16 11
4 1 2 16 15 5
5 17 5 18 19 12
6 13 17 12 14 0
7 13 13 10 0 13
8 5 11 11 14 2
9 0 4 14 9 8
10 8 0 19 2 1
11 15 18 17 9 6
12 15 6 7 2 13
13 1 10 9 4 0
14 12 4 14 8 1
15 18 10 12 0 8
16 3 4 19 17 18
17 2 7 3 12 3
18 11 14 13 3 12
19 19 1 3 10 11
df = df.apply(lambda x: pd.qcut(x, 3, labels=['Low','Medium','High']))
print (df)
0 1 2 3 4
0 Medium High High High High
1 High High Low Low Medium
2 High Medium Low High High
3 Low Medium Medium High Medium
4 Low Low High High Low
5 High Low High High High
6 Medium High Medium Medium Low
7 Medium High Low Low High
8 Low High Low Medium Low
9 Low Low Medium Medium Medium
10 Medium Low High Low Low
11 High High High Medium Medium
12 High Medium Low Low High
13 Low Medium Low Low Low
14 Medium Low Medium Medium Low
15 High Medium Medium Low Medium
16 Low Low High High High
17 Low Medium Low Medium Low
18 Medium High Medium Low High
19 High Low Low Medium Medium
Another idea if need processing all columns together is use DataFrame.stack
for MultiIndex Series
, pass to qcut
and last use Series.unstack
(output is different):
df = pd.qcut(df.stack(), 3, labels=['Low','Medium','High']).unstack()
print (df)
0 1 2 3 4
0 Medium High High High High
1 High High Medium Low Medium
2 High Medium Medium High High
3 Low Medium High High Medium
4 Low Low High High Low
5 High Low High High Medium
6 Medium High Medium High Low
7 Medium Medium Medium Low Medium
8 Low Medium Medium High Low
9 Low Low High Medium Medium
10 Medium Low High Low Low
11 High High High Medium Low
12 High Low Low Low Medium
13 Low Medium Medium Low Low
14 Medium Low High Medium Low
15 High Medium Medium Low Medium
16 Low Low High High High
17 Low Low Low Medium Low
18 Medium High Medium Low Medium
19 High Low Low Medium Medium