I have a DataFrame below:
col1
Numb10
Numb11
Numb12
Numb7
Numb8
How can I sort with number order:
col1
Numb7
Numb8
Numb10
Numb11
Numb12
I tried but got error TypeError: cannot convert the series to <class 'int'>
.
df.sort_values(by = "col1", key = (lambda x: int(x[4:])))
Update with one missing in col1
CodePudding user response:
key
in sort_values
takes the Series as parameter instead of individual element. From the docs:
Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.
In your case, you can use .str
and astype
for slicing and type convertion:
df.sort_values(by='col1', key=lambda s: s.str[4:].astype(int))
col1
3 Numb7
4 Numb8
0 Numb10
1 Numb11
2 Numb12
CodePudding user response:
Your x[4:]
might not always be integers. You can verify with
# convert to numerical values, float, not integers
extracted_nums = pd.to_numeric(df['col1'].str[4:], errors='coerce')
# check for invalid values
# if not `0` means you have something that are not numerical
print(extracted_nums.isna().any())
# sort by values
df.loc[extracted_nums.sort_values().index]