Pandas - How to add a counter column based on date after a groupby?-CodePudding

I have a dataframe in the following format (working with sports data):

Team	Player	Date	GameID
Bears	John	2022-10-01	A1
Bears	Dave	2022-10-01	A1
Bears	Steve	2022-10-01	A1
Bulls	Connor	2022-10-01	C2
Bulls	Jack	2022-10-01	C2
Bears	John	2022-10-03	A3

Basically, I want to be able to sort by team name and date, and add a column called GameNum that counts from 1 to 82 (82 games in the dataset for each team) based on the game number for the season, like below:

Team	Player	Date	GameID	GameNum
Bears	John	2022-10-01	A1	1
Bears	Dave	2022-10-01	A1	1
Bears	Steve	2022-10-01	A1	1
Bulls	Connor	2022-10-01	C2	1
Bulls	Jack	2022-10-01	C2	1
Bears	John	2022-10-03	A3	2

I can do this manually by taking a sub-dataframe of each unique team, sorting by game date, and then adding an iterator value from 1 to 82 and then unioning the results for each team, but I was wondering if there was a "cleaner" way to do this without resorting to for-loops and unioning based on teams.

CodePudding user response：

here is one way to do it

#create a dictionary of GameNum, with key being the index of the dataframe

d=dict(df.loc[~df.duplicated(subset=['Team','Date'])] # choose unique Team and Date
       .groupby(['Team'], as_index=False)             # groupby Team
       .cumcount() 1)                                 # create a count of Game


# map game number to Game based on index
df['GameNum']=df.index.map(d)

# ffill null values
df['GameNum'].ffill(inplace=True) 

# convert game num to int
df['GameNum']=df['GameNum'].astype(int )
df

Team    Player  Date    GameID  GameNum
0   Bears   John    2022-10-01  A1  1
1   Bears   Dave    2022-10-01  A1  1
2   Bears   Steve   2022-10-01  A1  1
3   Bulls   Connor  2022-10-01  C2  1
4   Bulls   Jack    2022-10-01  C2  1
5   Bears   John    2022-10-03  A3  2

CodePudding user response：

I'm not really sure on the manual aspect you want to make cleaner but you can try below solution for your task.

It provides a basic setup for your problem - the only thing I've changed is data column, which is now named 'value' and contains string values.

import pandas as pd

data = {'team':["bears", "bears", "bears", "bulls", "bulls", "bears"], 'value':["a", "a", "a", "a", "c", "b"]}
df = pd.DataFrame(data, columns = ["team", "value"])

df_grouped = df.groupby('team')
df_team_list = [df_grouped.get_group(x) for x in df_grouped.groups]

df_team_list_with_game_num = list()
for df_team in df_team_list:
    df_sorted = df_team.sort_values(by=["value"])
    unique_values = df_sorted["value"].unique()

    game_num_map = dict()
    for i, value in enumerate(unique_values):
        game_num_map[value] = i
    
    df_sorted["game_num"] = df_sorted["value"].apply(lambda x: game_num_map[x])
    df_team_list_with_game_num.append(df_sorted)

df_with_game_num = pd.concat(df_team_list_with_game_num)
print(df_with_game_num)

CodePudding user response：

You could also try something like:


df["GameNum"] = (
    df.groupby("Team")["Date"]
    .transform(lambda grp: grp.rank(numeric_only=False, method="dense"))
    .astype(int)
)

Output: