I have a dataframe like this:
well pad type
'1A22' 22 a1
'2A22' 22 a1
'4A23' 23 a2
I'd like to sort dataframe by 2 columns of pad and well, but the 2nd column(well) using the number before 'A', so I am doing like this for key for 2nd column, and I got an error. Thanks for your help
df=df.sort_values(by=['pad','well'],key=lambda x1,x2: (int(x1),int(x2.split('A')[0])),ascending=True)
TypeError: () missing 1 required positional argument: 'x2'
CodePudding user response:
You can create an intermediate sort key by splitting the strings in column well
around A
then changing the dtype type of first split to int
:
df['key'] = df['well'].str.split('A', n=1).str[0].astype(int)
df = df.sort_values(['pad', 'key'])
Alternative approach 1 (Use sort_values
twice):
df = df.sort_values('pad').sort_values('well', key=lambda s: s.str.split('A', n=1).str[0].astype(int))
Alternative approach 2 (Define a custom key func):
def keyfunc(s):
if s.name == 'well':
return s.str.split('A', n=1).str[0].astype(int)
return s
df = df.sort_values(['pad', 'well'], key=keyfunc)
Result:
print(df)
well pad type key
0 1A22 22 a1 1
1 2A22 22 a1 2
2 4A23 23 a2 4