Home > database >  Creating and storing a custom sort procedure as a function
Creating and storing a custom sort procedure as a function

Time:06-28

I have the following pandas dataframe:

PLAYER      GRP
Mike        F3.03
Max         F2.01
El          G7.99
Billy       G7.09
Steve       B13.99
Vecna       F3.03

I need to sort the dataframe by the grp column, first by letter, than by number before the period, then by number after the period, with all sorts being ascending. If there is a tie (see Vecna and Mike), then it should sort by player ascending. Note that the actual use case does have leading zeros for numbers after the period but not before.

My desired end result is a table sorted as follows:

PLAYER      GRP
Steve       B13.99
Max         F2.01
Mike        F3.03
Vecna       F3.03
Billy       G7.09
El          G7.99

Can anybody provide me with a way to do this? I would ideally like to store the procedure as a function which I can then use on any dataframe with a 'Group' column which has the same structure as above?

Sample Data:

import pandas as pd
df = pd.DataFrame({'player': ['Mike', 'Max', 'El', 'Billy', 'Steve', 'Vecna'], 
                   'grp': ['F3.03', 'F2.01', 'G7.99', 'G7.09', 'B13.99', 'F3.03']})

Current Work

I'm pretty new to Python, so this is what I've got so far.

I've been able to isolate each of the respective elements into its own list, but not sure where to go from here:

grp = df['grp']
letters = [re.sub(r'[^a-zA-Z]', '', let) for let in grp]
numbers = [re.sub(r'[^0-9]', '', num.split('.')[0]) for num in grp]
numbers_new = [int(num) for num in [re.sub(r'[^0-9]', '', num_new.split('.')[1]) for num_new in grp]]
list(zip(list(df['player']), grp, letters, numbers, numbers_new))

CodePudding user response:

def sort_custom(d: pd.DataFrame, 
                primary: str = 'grp', 
                secondary: str | list = None,
                inplace: bool = False) -> pd.DataFrame | None:
    """
    Pass a DataFrame containing a LetterNumber column to sort by it.
    Defaults to 'grp' columns.
    Optional, pass a column or list of other columns to also sort by.
    inplace keyword is also possible.
    """
    if not inplace:
        d = d.copy()
    cols = ['l', 'v']
    d[cols] = pd.concat([d[primary].str[0],
                         d[primary].str[1:]
                        .astype(float)], axis=1).to_numpy()
    if secondary:
        if isinstance(secondary, list):
            d.sort_values(cols   secondary, inplace=True)
        else:
            d.sort_values(cols   [secondary], inplace=True)
    else:
        d.sort_values(cols, inplace=True)    
    d.drop(cols, axis=1, inplace=True)
    return d if not inplace else None

df = pd.DataFrame({'player': ['Mike', 'Max', 'El', 'Billy', 'Steve', 'Vecna'], 
                   'grp': ['F3.03', 'F2.01', 'G7.99', 'G7.09', 'B13.99', 'F3.03']})

sort_custom(d=df, primary='grp', secondary='player', inplace=True)
print(df)

Output:

  PLAYER     grp
4  Steve  B13.99
1    Max   F2.01
0   Mike   F3.03
5  Vecna   F3.03
3  Billy   G7.09
2     El   G7.99

The issue with just sorting by ['grp', 'player'] is the following:

df2 = pd.DataFrame({'player': ['Bob', 'Joe'], 'grp':['A12.09', 'A2.09']})
print(df2.sort_values(['grp', 'player']))

Output:

  player     grp
0    Bob  A12.09
1    Joe   A2.09

Here, according to string sorting, A12.09 < A2.09, but we want A12.09 > A2.09.

print(sort_custom(df2))

Output:

  player     grp
1    Joe   A2.09
0    Bob  A12.09

Personally, I think the better way of doing this would be permanently splitting the letter and number value and changing everything you do to work with that:

df['num'] = df.grp[1:].astype(float)
df['grp'] = df.grp.str[0]
df = df.sort_values(['grp', 'num', 'player'])
print(df)

Output:

  player grp    num
4  Steve   B  13.99
1    Max   F   2.01
0   Mike   F   3.03
5  Vecna   F   3.03
3  Billy   G   7.09
2     El   G   7.99

You can always combine them again by doing:

df.grp = df.grp   df.num.astype(str)

CodePudding user response:

If let's say the names of the player are not all in capital letters,

df = pd.DataFrame({'player': ['Mike', 'Max', 'El', 'Billy', 'Steve', 'Vecna', 'adam'], 
                   'grp': ['F3.03', 'F2.01', 'G7.99', 'G7.09', 'B13.99', 'F3.03', 'F3.03']})

You can use df.sort_values with a key function to complete the sort.

df.sort_values(by=['grp','player'],key=lambda col: col.str.lower(),ignore_index=True)

  player     grp
0  Steve  B13.99
1    Max   F2.01
2   adam   F3.03
3   Mike   F3.03
4  Vecna   F3.03
5  Billy   G7.09
6     El   G7.99

  • Related