My code looks like this
df2['min_salary'] = min_hr.apply(lambda x: int(x.split('-')[0]))
df2['max_salary'] = min_hr.apply(lambda x: int(x.split('-')[1]))
the data its using is a Salary column that looks like this 80 - 100 The min salary works fine, but the max salary keeps on coming up with the error. Am I doing something wrong?
CodePudding user response:
Try like this. This should handle if 'x' doesn't have - in it
df2['max_salary'] = min_hr.apply(lambda x: int(x.split('-')[1] if len(x.split('-'))>1 else x.split('-')[0]))
CodePudding user response:
def min_max_value(sal_string):
fields = x.split('-')
if len(fields)>1:
if fields[0].strip().isdigit():
min_field = int(fields[0].strip())
else:
min_field = None
if fields[1].strip().isdigit():
max_field = int(fields[0].strip())
else:
max_field = None
else:
if fields[0].strip().isdigit():
min_field = int(fields[0].strip())
else:
min_field, max_field = None, None
return min_field, max_field
df2[['min_salary','max_salary']] = min_hr.apply(min_max_value, result_type="expand")
You can try something like this.
CodePudding user response:
As Tim points out, you likely have data which doesn't follow the exact format you're trying to split the strings on. You could try this approach, which adds NaN
to any columns which didn't produce two values from splitting:
df2[["min_salary", "max_salary"]] = min_hr.str.split("-").apply(pd.Series)
Here's an example output after using that code on the "A"
column of this dataframe (and naming the two new columns "Ax"
and "Ay"
:
A Ax Ay
0 10-20 10 20
1 30-40 30 40
2 70 70 NaN
Note that if you want single salary values to fill into the "max_salary"
column, you'll need to use a slightly different approach:
df2[["min_salary", "max_salary"]] = min_hr.split("-").apply(lambda x: [np.nan]*(len(x) < 2) x).to_list()
Which puts 70
in the Ay
column:
A Ax Ay
0 10-20 10 20
1 30-40 30 40
2 70 NaN 70
Another approach (and possibly ideal in this particular case) would be to fill NaN
laterally:
df2[["min_salary", "max_salary"]] = min_hr.str.split("-").apply(pd.Series).ffill(axis=1)
A Ax Ay
0 10-20 10 20
1 30-40 30 40
2 70 70 70
Note that none of these solutions convert your data to numeric types.