Home > OS >  Pandas sort column with numerical string
Pandas sort column with numerical string

Time:10-13

I have a DataFrame below:

col1

Numb10
Numb11
Numb12
Numb7
Numb8

How can I sort with number order:

col1

Numb7
Numb8
Numb10
Numb11
Numb12

I tried but got error TypeError: cannot convert the series to <class 'int'>.

df.sort_values(by = "col1", key = (lambda x: int(x[4:])))

Update with one missing in col1

CodePudding user response:

key in sort_values takes the Series as parameter instead of individual element. From the docs:

Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

In your case, you can use .str and astype for slicing and type convertion:

df.sort_values(by='col1', key=lambda s: s.str[4:].astype(int))
     col1
3   Numb7
4   Numb8
0  Numb10
1  Numb11
2  Numb12

CodePudding user response:

Your x[4:] might not always be integers. You can verify with

# convert to numerical values, float, not integers
extracted_nums = pd.to_numeric(df['col1'].str[4:], errors='coerce')

# check for invalid values
# if not `0` means you have something that are not numerical
print(extracted_nums.isna().any())

# sort by values
df.loc[extracted_nums.sort_values().index]
  • Related