Home > Software design >  how to sort by 2 columns with one column using a function
how to sort by 2 columns with one column using a function

Time:04-01

I have a dataframe like this:

  well pad  type
 '1A22' 22  a1
 '2A22' 22  a1
 '4A23' 23  a2

I'd like to sort dataframe by 2 columns of pad and well, but the 2nd column(well) using the number before 'A', so I am doing like this for key for 2nd column, and I got an error. Thanks for your help

df=df.sort_values(by=['pad','well'],key=lambda x1,x2: (int(x1),int(x2.split('A')[0])),ascending=True)

TypeError: () missing 1 required positional argument: 'x2'

CodePudding user response:

You can create an intermediate sort key by splitting the strings in column well around A then changing the dtype type of first split to int:

df['key'] = df['well'].str.split('A', n=1).str[0].astype(int)
df = df.sort_values(['pad', 'key'])

Alternative approach 1 (Use sort_values twice):

df = df.sort_values('pad').sort_values('well', key=lambda s: s.str.split('A', n=1).str[0].astype(int))

Alternative approach 2 (Define a custom key func):

def keyfunc(s):
    if s.name == 'well':
        return s.str.split('A', n=1).str[0].astype(int)
    return s

df = df.sort_values(['pad', 'well'], key=keyfunc)

Result:

print(df)

   well  pad type  key
0  1A22   22   a1    1
1  2A22   22   a1    2
2  4A23   23   a2    4
  • Related