Home > Software design >  Sort underscore before upercase letters
Sort underscore before upercase letters

Time:10-07

I have a DataFrame with UPPERCASE column names from a database that I'd like to sort. Some of them are prefixed with an underscore _. The default sorting puts them at the end. Do I have to convert them to lowercase, then sort, then convert them back to uppercase or is there an easier way?

Here's some sample data. When the values of cl2 are lowercase then the order is correct. With uppercase not so much.

pd.DataFrame({
   "cl1": ["foo", "foo", "foo"], 
   "cl2": ["_FOO", "BAR", "BAZ"]}
).sort_values(["cl1", "cl2"])

CodePudding user response:

If conversion to lowercase works for you, you can use it as a key:

pd.DataFrame({
   "cl1": ["foo", "foo", "foo"], 
   "cl2": ["_FOO", "BAR", "BAZ"]}
).sort_values(["cl1", "cl2"], key=lambda s: s.str.lower())

Other option, sort by the strings without _, then the presence of _:

(pd.DataFrame({
   "cl1": ["foo", "foo", "foo"], 
   "cl2": ["_FOO", "BAR", "BAZ"]})
   .sort_values(["cl1", "cl2"], key=lambda s: s.str.lstrip('_'))
   .sort_values(["cl1", "cl2"], key=lambda s: ~s.str.startswith('_'), kind='stable')
)

output:

   cl1   cl2
0  foo  _FOO
1  foo   BAR
2  foo   BAZ
  • Related