I have a DataFrame
with UPPERCASE
column names from a database that I'd like to sort. Some of them are prefixed with an underscore _
. The default sorting puts them at the end. Do I have to convert them to lowercase, then sort, then convert them back to uppercase or is there an easier way?
Here's some sample data. When the values of cl2
are lowercase then the order is correct. With uppercase not so much.
pd.DataFrame({
"cl1": ["foo", "foo", "foo"],
"cl2": ["_FOO", "BAR", "BAZ"]}
).sort_values(["cl1", "cl2"])
CodePudding user response:
If conversion to lowercase works for you, you can use it as a key:
pd.DataFrame({
"cl1": ["foo", "foo", "foo"],
"cl2": ["_FOO", "BAR", "BAZ"]}
).sort_values(["cl1", "cl2"], key=lambda s: s.str.lower())
Other option, sort by the strings without _
, then the presence of _
:
(pd.DataFrame({
"cl1": ["foo", "foo", "foo"],
"cl2": ["_FOO", "BAR", "BAZ"]})
.sort_values(["cl1", "cl2"], key=lambda s: s.str.lstrip('_'))
.sort_values(["cl1", "cl2"], key=lambda s: ~s.str.startswith('_'), kind='stable')
)
output:
cl1 cl2
0 foo _FOO
1 foo BAR
2 foo BAZ