I am trying to fix this NBA game detail table. Every two rows are the same game(GAME_ID) in the table. I want the two rows to be in one row so that the home team and the away team's data will be on the same row.
CodePudding user response:
You can use set_index and unstack, then flatten headers.
Here is come code I did for baseball games, I had to create my own Game No:
# source datafrome
year = '2020'
df = pd.read_excel('https://www.sportsbookreviewsonline.com/scoresoddsarchives/mlb/mlb odds 2011.xlsx')
df = df.dropna(how="all").copy()
df["Date"] = df["Date"].astype(int) # To handle excel float type in some years
df["Date"] = pd.to_datetime(
f"{year}" df["Date"].astype(str).str.rjust(4, "0"), format="%Y%m%d"
)
cols = ["Date", "VH", "Team", "Final", "Close"]
df = df[cols]
df = df[df["VH"].isin(["V", "H"])]
# Flatten teams lines in to games
df["Game No"] = (df["VH"] == "V").cumsum()
df = df.set_index(["Game No", "Date", "VH"]).unstack()
df.columns = df.columns.map("_".join)
df = df.reset_index()
Output:
Game No Date Team_H Team_V Final_H Final_V Close_H Close_V
0 1 2020-03-31 WAS ATL 0 2 119 -139
1 2 2020-03-31 CIN MIL 7 6 -110 -110
2 3 2020-03-31 STL SDG 3 5 -178 158
3 4 2020-03-31 LOS SFO 2 1 -114 -106
4 5 2020-03-31 NYY DET 6 3 -152 132
... ... ... ... ... ... ... ... ...
2462 2463 2020-10-22 TEX STL 7 16 -183 163
2463 2464 2020-10-23 TEX STL 4 0 -171 151
2464 2465 2020-10-24 TEX STL 4 2 -130 110
2465 2466 2020-10-27 STL TEX 10 9 -114 -106
2466 2467 2020-10-28 STL TEX 6 2 -130 110
[2467 rows x 8 columns]