Home > Software design >  Bin values into groups
Bin values into groups

Time:11-09

The relevant data in my dataframe looks as follows:

Datapoint Values
1 0.2
2 0.8
3 0.4
4 0.1
5 1.0
6 0.6
7 0.7
8 0.2
9 0.5
10 0.1

I am hoping to group the numbers in the Values column into three categories: less than 0.25 as 'low', between 0.25 and 0.75 as middle and greater than 0.75 as high. I want to create a new column which returns 'low', 'middle' or 'high' for each row based off the data in the value column.

What I have tried:

def categorize_values("Values"):
    if "Values" > 0.75:
        return 'high'
    elif 'Values' < 0.25:
        return 'low'
    else:
        return 'middle'

However this is returning an error for me.

CodePudding user response:

You should take the '' around the Values away. That would look like this:

def categorize_values(Values):
    if Values > 0.75:
        return 'high'
    elif Values < 0.25:
        return 'low'
    else:
        return 'middle'

CodePudding user response:

If you're using a dataframe, Pandas has a built-in function called pd.cut()

import pandas as pd
import numpy as np
from io import StringIO

df = pd.read_csv(StringIO('''Datapoint  Values
1   0.2
2   0.8
3   0.4
4   0.1
5   1.0
6   0.6
7   0.7
8   0.2
9   0.5
10  0.1'''), sep='\t')

df['category'] = pd.cut(df['Values'], [0, 0.25, 0.75, df['Values'].max()], labels=['low', 'middle', 'high'])

#output
>>> df
   Datapoint  Values category
0          1     0.2      low
1          2     0.8     high
2          3     0.4   middle
3          4     0.1      low
4          5     1.0     high
5          6     0.6   middle
6          7     0.7   middle
7          8     0.2      low
8          9     0.5   middle
9         10     0.1      low

CodePudding user response:

First of all, you cannot put constants in your function parameters. You need to fix your function first like this,

def categorize_values(Values):
    if Values > 0.75:
        return 'high'
    elif Values < 0.25:
        return 'low'
    else:
        return 'middle'

and then you can apply that function to your 'Values' column as below.

df['Category'] = df['Values'].apply(categorize_values)

df.head()

it will generate that DataFrame,

           Values   Category
DataPoint       
1          0.22     low
2          0.32     middle
3          0.55     middle
4          0.75     middle
5          0.12     low
  • Related