x.split Giving IndexError: list index out of range-CodePudding

My code looks like this

   df2['min_salary'] = min_hr.apply(lambda x: int(x.split('-')[0]))
   df2['max_salary'] = min_hr.apply(lambda x: int(x.split('-')[1]))

the data its using is a Salary column that looks like this 80 - 100 The min salary works fine, but the max salary keeps on coming up with the error. Am I doing something wrong?

CodePudding user response：

Try like this. This should handle if 'x' doesn't have - in it

df2['max_salary'] = min_hr.apply(lambda x: int(x.split('-')[1] if len(x.split('-'))>1 else x.split('-')[0]))

CodePudding user response：

def min_max_value(sal_string):
    fields = x.split('-')
    if len(fields)>1:
        if fields[0].strip().isdigit():
            min_field = int(fields[0].strip())
        else:
            min_field = None
        if fields[1].strip().isdigit():
            max_field = int(fields[0].strip())
        else:
            max_field = None
    else:
        if fields[0].strip().isdigit():
            min_field = int(fields[0].strip())
        else:
            min_field, max_field = None, None
    return min_field, max_field
df2[['min_salary','max_salary']] = min_hr.apply(min_max_value, result_type="expand")

You can try something like this.

CodePudding user response：

As Tim points out, you likely have data which doesn't follow the exact format you're trying to split the strings on. You could try this approach, which adds NaN to any columns which didn't produce two values from splitting:

df2[["min_salary", "max_salary"]] = min_hr.str.split("-").apply(pd.Series)

Here's an example output after using that code on the "A" column of this dataframe (and naming the two new columns "Ax" and "Ay":

       A  Ax   Ay
0  10-20  10   20
1  30-40  30   40
2     70  70  NaN

Note that if you want single salary values to fill into the "max_salary" column, you'll need to use a slightly different approach:

df2[["min_salary", "max_salary"]] = min_hr.split("-").apply(lambda x: [np.nan]*(len(x) < 2)   x).to_list()

Which puts 70 in the Ay column:

       A   Ax  Ay
0  10-20   10  20
1  30-40   30  40
2     70  NaN  70

Another approach (and possibly ideal in this particular case) would be to fill NaN laterally:

df2[["min_salary", "max_salary"]] = min_hr.str.split("-").apply(pd.Series).ffill(axis=1)

       A  Ax  Ay
0  10-20  10  20
1  30-40  30  40
2     70  70  70

Note that none of these solutions convert your data to numeric types.