I've been trying to map a column from my df into 4 categories (binning) but, the column contains mixed values in it: int and str, it looks something like this:
df['data_column'] = ['22', '8', '11', 'Text', '17', 'Text', '6']
The categories I've been tring to change them to:
- 1 to 10: superb
- 10 to 20: awesome
- 20 to 30: great
- 'Text': text
This has been the way I've been trying to solve this:
my_criteria = [df['data_column'][df['data_column'] != 'Text'].astype('int64').between(1, 10),
df['data_column'][df['data_column'] != 'Text'].astype('int64').between(10, 20),
df['data_column'][df['data_column'] != 'Text'].astype('int64').between(20, 30),
df['data_column'][df['data_column'] == 'Text']]
my_values = ['superb', 'awesome', 'great', 'text']
df['data_column'] = np.select(my_ criteria, my_ values, 0)
But, I get this error: ValueError: shape mismatch: objects cannot be broadcast to a single shape.
How can I fix this? Any help is welcomed. The desired output:
df['data_column'] = ['great', 'superb', 'awesome', text', 'awesome', 'text', 'superb']
Thank you in advance!
CodePudding user response:
All values in your condlist
for np.select
must be the same length. Yours are not.
You can use pd.to_numeric
with errors='coerce'
to force values to convert to numeric.
Then, use pd.cut
to create your bins. Convert back to strings from categorical, and replace 'nan'
entries with 'text'
.
Given:
data_column
0 22
1 8
2 11
3 Text
4 17
5 Text
6 6
Doing:
df.data_column = pd.to_numeric(df.data_column, 'coerce')
df.data_column = (pd.cut(df.data_column, [1, 10, 20, 30], labels=['superb','awesome','great'])
.astype(str)
.replace('nan', 'text'))
Output:
data_column
0 great
1 superb
2 awesome
3 text
4 awesome
5 text
6 superb