I'm using cut() in Pandas to categorize the values of a numerical column.
df['bmi_bin']=pd.cut(df['Body Mass Index'],bins=4,labels=[0,1,2,3])
Output:
0 2
1 2
2 1
3 2
4 1
..
683 1
690 0
694 1
696 0
699 1
I tried putting the same code in a loop:
num_of_bins=4
for i in range(0,num_of_bins):
df['bmi_bin']=pd.cut(df['Body Mass Index'],bins=num_of_bins, labels=[i])
This is giving me the following error:
ValueError: Bin labels must be one fewer than the number of bin edges
Where did I go wrong? Doesn't labels[i] give label values [0,1,2,3] which are fewer than 4?
CodePudding user response:
Your code is trying to split dataframe into diff bins with lable. But bins and lables count not matching. In for loop you are spliting dataframe into 0, 1,2,3 bins in different iteration but lable is only one at a time whether its 0, 1,2 or 3. If you want to use loop then below can be used but not sure what output you are expecting -
num_of_bins=4
lt=[]
for i in range(0,num_of_bins):
lt.append(i)
df['bmi_bin']=pd.cut(df['Body Mass Index'],bins=i, labels=lt)