I have the following pandas dataframe:
PLAYER GRP
Mike F3.03
Max F2.01
El G7.99
Billy G7.09
Steve B13.99
Vecna F3.03
I need to sort the dataframe by the grp
column, first by letter, than by number before the period, then by number after the period, with all sorts being ascending. If there is a tie (see Vecna and Mike), then it should sort by player
ascending. Note that the actual use case does have leading zeros for numbers after the period but not before.
My desired end result is a table sorted as follows:
PLAYER GRP
Steve B13.99
Max F2.01
Mike F3.03
Vecna F3.03
Billy G7.09
El G7.99
Can anybody provide me with a way to do this? I would ideally like to store the procedure as a function which I can then use on any dataframe with a 'Group' column which has the same structure as above?
Sample Data:
import pandas as pd
df = pd.DataFrame({'player': ['Mike', 'Max', 'El', 'Billy', 'Steve', 'Vecna'],
'grp': ['F3.03', 'F2.01', 'G7.99', 'G7.09', 'B13.99', 'F3.03']})
Current Work
I'm pretty new to Python, so this is what I've got so far.
I've been able to isolate each of the respective elements into its own list, but not sure where to go from here:
grp = df['grp']
letters = [re.sub(r'[^a-zA-Z]', '', let) for let in grp]
numbers = [re.sub(r'[^0-9]', '', num.split('.')[0]) for num in grp]
numbers_new = [int(num) for num in [re.sub(r'[^0-9]', '', num_new.split('.')[1]) for num_new in grp]]
list(zip(list(df['player']), grp, letters, numbers, numbers_new))
CodePudding user response:
def sort_custom(d: pd.DataFrame,
primary: str = 'grp',
secondary: str | list = None,
inplace: bool = False) -> pd.DataFrame | None:
"""
Pass a DataFrame containing a LetterNumber column to sort by it.
Defaults to 'grp' columns.
Optional, pass a column or list of other columns to also sort by.
inplace keyword is also possible.
"""
if not inplace:
d = d.copy()
cols = ['l', 'v']
d[cols] = pd.concat([d[primary].str[0],
d[primary].str[1:]
.astype(float)], axis=1).to_numpy()
if secondary:
if isinstance(secondary, list):
d.sort_values(cols secondary, inplace=True)
else:
d.sort_values(cols [secondary], inplace=True)
else:
d.sort_values(cols, inplace=True)
d.drop(cols, axis=1, inplace=True)
return d if not inplace else None
df = pd.DataFrame({'player': ['Mike', 'Max', 'El', 'Billy', 'Steve', 'Vecna'],
'grp': ['F3.03', 'F2.01', 'G7.99', 'G7.09', 'B13.99', 'F3.03']})
sort_custom(d=df, primary='grp', secondary='player', inplace=True)
print(df)
Output:
PLAYER grp
4 Steve B13.99
1 Max F2.01
0 Mike F3.03
5 Vecna F3.03
3 Billy G7.09
2 El G7.99
The issue with just sorting by ['grp', 'player']
is the following:
df2 = pd.DataFrame({'player': ['Bob', 'Joe'], 'grp':['A12.09', 'A2.09']})
print(df2.sort_values(['grp', 'player']))
Output:
player grp
0 Bob A12.09
1 Joe A2.09
Here, according to string sorting, A12.09 < A2.09
, but we want A12.09 > A2.09
.
print(sort_custom(df2))
Output:
player grp
1 Joe A2.09
0 Bob A12.09
Personally, I think the better way of doing this would be permanently splitting the letter and number value and changing everything you do to work with that:
df['num'] = df.grp[1:].astype(float)
df['grp'] = df.grp.str[0]
df = df.sort_values(['grp', 'num', 'player'])
print(df)
Output:
player grp num
4 Steve B 13.99
1 Max F 2.01
0 Mike F 3.03
5 Vecna F 3.03
3 Billy G 7.09
2 El G 7.99
You can always combine them again by doing:
df.grp = df.grp df.num.astype(str)
CodePudding user response:
If let's say the names of the player are not all in capital letters,
df = pd.DataFrame({'player': ['Mike', 'Max', 'El', 'Billy', 'Steve', 'Vecna', 'adam'],
'grp': ['F3.03', 'F2.01', 'G7.99', 'G7.09', 'B13.99', 'F3.03', 'F3.03']})
You can use df.sort_values
with a key function to complete the sort.
df.sort_values(by=['grp','player'],key=lambda col: col.str.lower(),ignore_index=True)
player grp
0 Steve B13.99
1 Max F2.01
2 adam F3.03
3 Mike F3.03
4 Vecna F3.03
5 Billy G7.09
6 El G7.99