I want to apply a function on non empty cells of a column that contains either integer numbers or empty cells. I checked the data type of that column, it is object.
This is part of the DataFrame:
import pandas as pd
from numpy import nan
df = pd.DataFrame(
{'Seam_Height': {0: 72.0, 1: 108.0, 2: nan, 3: nan, 4: 84.0, 5: 96.0,
6: nan, 7: 108.0, 8: 120.0, 9: nan, 10: 120.0, 11: nan,
12: 120.0, 13: 107.0},
'mining_method': {0: 'Longwall', 1: 'Longwall', 2: 'Longwall',
3: 'Longwall', 4: 'Longwall', 5: 'Longwall',
6: 'Longwall', 7: 'Longwall', 8: 'Longwall',
9: 'Longwall', 10: 'Longwall', 11: 'Longwall',
12: 'Longwall', 13: 'Longwall'},
'employee_num ': {0: 508.0, 1: 161.0, 2: nan, 3: nan, 4: 547.0, 5: 354.0,
6: 456.0, 7: nan, 8: 515.0, 9: 515.0, 10: nan, 11: 515.0,
12: 515.0, 13: 235.0}}
)
Seam_Height mining_method employee_num
0 72.0 Longwall 508.0
1 108.0 Longwall 161.0
2 NaN Longwall NaN
3 NaN Longwall NaN
4 84.0 Longwall 547.0
5 96.0 Longwall 354.0
6 NaN Longwall 456.0
7 108.0 Longwall NaN
8 120.0 Longwall 515.0
9 NaN Longwall 515.0
10 120.0 Longwall NaN
11 NaN Longwall 515.0
12 120.0 Longwall 515.0
13 107.0 Longwall 235.0
This is the function that I used to classify the thickness of seam height, it is a very simple function:
def seam_thickness_class_func(var):
if var < 43:
return "V_low"
if var < 60:
return "Low"
if var < 72:
return "Medium"
else:
return "High"
df['Seam_class'] = df.apply(lambda x: seam_thickness_class_func(x["Seam_Height"]) if(pd.notnull(x[0])) else " ", axis = 1)
The function will be applied if the cell contains a number, while if it is empty, it retruns " ".
I get this error message when I apply the function:
TypeError: '<' not supported between instances of 'str' and 'int'
CodePudding user response:
Let's convert to_numeric
then pd.cut
instead. pd.cut
is specifically designed to:
Bin values into discrete intervals.
We can bin the values:
- (-∞, 43) with label V_low
- [43, 60) with Low
- [60, 72) with Medium
- [72, ∞) with High
Note right=False
means upper bound non-inclusive. Which is analogous of strictly less than in the shown function.
# import numpy as np
df['Seam_class'] = pd.cut(
pd.to_numeric(df['Seam_Height'], errors='coerce'),
bins=[np.NINF, 43, 60, 72, np.inf],
labels=['V_low', 'Low', 'Medium', 'High'],
right=False
)
df
:
Seam_Height mining_method employee_num Seam_class
0 72.0 Longwall 508.0 High
1 108.0 Longwall 161.0 High
2 Longwall NaN NaN
3 NaN Longwall NaN NaN
4 84.0 Longwall 547.0 High
5 96.0 Longwall 354.0 High
6 NaN Longwall 456.0 NaN
7 108.0 Longwall NaN High
8 120.0 Longwall 515.0 High
9 NaN Longwall 515.0 NaN
10 120.0 Longwall NaN High
11 NaN Longwall 515.0 NaN
12 120.0 Longwall 515.0 High
13 107.0 Longwall 235.0 High
We can further add_categories
and fillna
for missing values to be replaced with ' '
:
# import numpy as np
df['Seam_class'] = pd.cut(
pd.to_numeric(df['Seam_Height'], errors='coerce'),
bins=[np.NINF, 43, 60, 72, np.inf],
labels=['V_low', 'Low', 'Medium', 'High'],
right=False
).cat.add_categories(' ').fillna(' ')
df
:
Seam_Height mining_method employee_num Seam_class
0 72.0 Longwall 508.0 High
1 108.0 Longwall 161.0 High
2 NaN Longwall NaN
3 NaN Longwall NaN
4 84.0 Longwall 547.0 High
5 96.0 Longwall 354.0 High
6 NaN Longwall 456.0
7 108.0 Longwall NaN High
8 120.0 Longwall 515.0 High
9 NaN Longwall 515.0
10 120.0 Longwall NaN High
11 NaN Longwall 515.0
12 120.0 Longwall 515.0 High
13 107.0 Longwall 235.0 High
If we need to fix the apply
version, we should use Series.apply
instead after converting the column to_numeric
to ensure we're only dealing with numeric values, and address the null checking in the function itself:
def seam_thickness_class_func(var):
# Test isnull here
if pd.isnull(var):
return ' '
if var < 43:
return "V_low"
if var < 60:
return "Low"
if var < 72:
return "Medium"
return "High"
df['Seam_class'] = pd.to_numeric(
df['Seam_Height'], errors='coerce'
).apply(seam_thickness_class_func)
df
:
Seam_Height mining_method employee_num Seam_class
0 72.0 Longwall 508.0 High
1 108.0 Longwall 161.0 High
2 NaN Longwall NaN
3 NaN Longwall NaN
4 84.0 Longwall 547.0 High
5 96.0 Longwall 354.0 High
6 NaN Longwall 456.0
7 108.0 Longwall NaN High
8 120.0 Longwall 515.0 High
9 NaN Longwall 515.0
10 120.0 Longwall NaN High
11 NaN Longwall 515.0
12 120.0 Longwall 515.0 High
13 107.0 Longwall 235.0 High