The relevant data in my dataframe looks as follows:
Datapoint | Values |
---|---|
1 | 0.2 |
2 | 0.8 |
3 | 0.4 |
4 | 0.1 |
5 | 1.0 |
6 | 0.6 |
7 | 0.7 |
8 | 0.2 |
9 | 0.5 |
10 | 0.1 |
I am hoping to group the numbers in the Values column into three categories: less than 0.25 as 'low', between 0.25 and 0.75 as middle and greater than 0.75 as high. I want to create a new column which returns 'low', 'middle' or 'high' for each row based off the data in the value column.
What I have tried:
def categorize_values("Values"):
if "Values" > 0.75:
return 'high'
elif 'Values' < 0.25:
return 'low'
else:
return 'middle'
However this is returning an error for me.
CodePudding user response:
You should take the '' around the Values away. That would look like this:
def categorize_values(Values):
if Values > 0.75:
return 'high'
elif Values < 0.25:
return 'low'
else:
return 'middle'
CodePudding user response:
If you're using a dataframe, Pandas has a built-in function called pd.cut()
import pandas as pd
import numpy as np
from io import StringIO
df = pd.read_csv(StringIO('''Datapoint Values
1 0.2
2 0.8
3 0.4
4 0.1
5 1.0
6 0.6
7 0.7
8 0.2
9 0.5
10 0.1'''), sep='\t')
df['category'] = pd.cut(df['Values'], [0, 0.25, 0.75, df['Values'].max()], labels=['low', 'middle', 'high'])
#output
>>> df
Datapoint Values category
0 1 0.2 low
1 2 0.8 high
2 3 0.4 middle
3 4 0.1 low
4 5 1.0 high
5 6 0.6 middle
6 7 0.7 middle
7 8 0.2 low
8 9 0.5 middle
9 10 0.1 low
CodePudding user response:
First of all, you cannot put constants in your function parameters. You need to fix your function first like this,
def categorize_values(Values):
if Values > 0.75:
return 'high'
elif Values < 0.25:
return 'low'
else:
return 'middle'
and then you can apply that function to your 'Values' column as below.
df['Category'] = df['Values'].apply(categorize_values)
df.head()
it will generate that DataFrame,
Values Category
DataPoint
1 0.22 low
2 0.32 middle
3 0.55 middle
4 0.75 middle
5 0.12 low