Below is my code and Dataframes. stats_df
is much bigger. Not sure if it matters, but the column values are EXACTLY as they appear in the actual files. I can't merge the two DFs without losing 'Alex Len' even though both DFs have the same PlayerID value of '20000852'
stats_df = pd.read_csv('stats_todate.csv')
matchup_df = pd.read_csv('matchup.csv')
new_df = pd.merge(stats_df, matchup_df[['PlayerID','Matchup','Started','GameStatus']])
I have also tried:
stats_df['PlayerID'] = stats_df['PlayerID'].astype(str)
matchup_df['PlayerID'] = matchup_df['PlayerID'].astype(str)
stats_df['PlayerID'] = stats_df['PlayerID'].str.strip()
matchup_df['PlayerID'] = matchup_df['PlayerID'].str.strip()
Any ideas?
Here are my two Dataframes:
DF1:
PlayerID SeasonType Season Name Team Position
20001713 1 2018 A.J. Hammons MIA C
20002725 2 2022 A.J. Lawson ATL SG
20002038 2 2021 Élie Okobo BKN PG
20002742 2 2022 Aamir Simms NY PF
20000518 3 2018 Aaron Brooks MIN PG
20000681 1 2022 Aaron Gordon DEN PF
20001395 1 2018 Aaron Harrison DAL SG
20002680 1 2022 Aaron Henry PHI SF
20002005 1 2022 Aaron Holiday PHO PG
20001981 3 2018 Aaron Jackson HOU PF
20002539 1 2022 Aaron Nesmith BOS SF
20002714 1 2022 Aaron Wiggins OKC SG
20001721 1 2022 Abdel Nader PHO SF
20002251 2 2020 Abdul Gaddy OKC PG
20002458 1 2021 Adam Mokoka CHI SG
20002619 1 2022 Ade Murkey SAC PF
20002311 1 2022 Admiral Schofield ORL PF
20000783 1 2018 Adreian Payne ORL PF
20002510 1 2022 Ahmad Caver IND PG
20002498 2 2020 Ahmed Hill CHA PG
20000603 1 2022 Al Horford BOS PF
20000750 3 2018 Al Jefferson IND C
20001645 1 2019 Alan Williams BKN PF
20000837 1 2022 Alec Burks NY SG
20001882 1 2018 Alec Peters PHO PF
20002850 1 2022 Aleem Ford ORL SF
20002542 1 2022 Aleksej Pokuševski OKC PF
20002301 3 2021 Alen Smailagic GS PF
20001763 1 2019 Alex Abrines OKC SG
20001801 1 2022 Alex Caruso CHI SG
20000852 1 2022 Alex Len SAC C
DF2:
PlayerID Name Date Started Opponent GameStatus Matchup
20000681 Aaron Gordon 4/1/2022 1 MIN 16
20002005 Aaron Holiday 4/1/2022 0 MEM 21
20002539 Aaron Nesmith 4/1/2022 0 IND 13
20002714 Aaron Wiggins 4/1/2022 1 DET 14
20002311 Admiral Schofield 4/1/2022 0 TOR 10
20000603 Al Horford 4/1/2022 1 IND 13
20002542 Aleksej Pokuševski 4/1/2022 1 DET 14
20000852 Alex Len 4/1/2022 1 HOU 22
CodePudding user response:
You need to specify the column you want to merge on using the on
keyword argument:
new_df = pd.merge(stats_df, matchup_df[['PlayerID','Matchup','Started','GameStatus']], on=['PayerID'])
Otherwise it will merge using all of the shared columns.
Here is the explanation from the pandas docs:
on
: label or list Column or index level names to join on. These must be found in both DataFrames. Ifon
is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.