Home > database >  Pandas Merge Not Working When Values Are an Exact Match
Pandas Merge Not Working When Values Are an Exact Match

Time:04-02

Below is my code and Dataframes. stats_df is much bigger. Not sure if it matters, but the column values are EXACTLY as they appear in the actual files. I can't merge the two DFs without losing 'Alex Len' even though both DFs have the same PlayerID value of '20000852'

stats_df = pd.read_csv('stats_todate.csv')
matchup_df = pd.read_csv('matchup.csv')

new_df = pd.merge(stats_df, matchup_df[['PlayerID','Matchup','Started','GameStatus']])

I have also tried:

stats_df['PlayerID'] = stats_df['PlayerID'].astype(str)
matchup_df['PlayerID'] = matchup_df['PlayerID'].astype(str)
stats_df['PlayerID'] = stats_df['PlayerID'].str.strip()
matchup_df['PlayerID'] = matchup_df['PlayerID'].str.strip()

Any ideas?

Here are my two Dataframes:

DF1:

PlayerID    SeasonType  Season  Name    Team    Position
20001713    1   2018    A.J. Hammons    MIA C
20002725    2   2022    A.J. Lawson ATL SG
20002038    2   2021    Élie Okobo BKN PG
20002742    2   2022    Aamir Simms NY  PF
20000518    3   2018    Aaron Brooks    MIN PG
20000681    1   2022    Aaron Gordon    DEN PF
20001395    1   2018    Aaron Harrison  DAL SG
20002680    1   2022    Aaron Henry PHI SF
20002005    1   2022    Aaron Holiday   PHO PG
20001981    3   2018    Aaron Jackson   HOU PF
20002539    1   2022    Aaron Nesmith   BOS SF
20002714    1   2022    Aaron Wiggins   OKC SG
20001721    1   2022    Abdel Nader PHO SF
20002251    2   2020    Abdul Gaddy OKC PG
20002458    1   2021    Adam Mokoka CHI SG
20002619    1   2022    Ade Murkey  SAC PF
20002311    1   2022    Admiral Schofield   ORL PF
20000783    1   2018    Adreian Payne   ORL PF
20002510    1   2022    Ahmad Caver IND PG
20002498    2   2020    Ahmed Hill  CHA PG
20000603    1   2022    Al Horford  BOS PF
20000750    3   2018    Al Jefferson    IND C
20001645    1   2019    Alan Williams   BKN PF
20000837    1   2022    Alec Burks  NY  SG
20001882    1   2018    Alec Peters PHO PF
20002850    1   2022    Aleem Ford  ORL SF
20002542    1   2022    Aleksej Pokuševski OKC PF
20002301    3   2021    Alen Smailagic  GS  PF
20001763    1   2019    Alex Abrines    OKC SG
20001801    1   2022    Alex Caruso CHI SG
20000852    1   2022    Alex Len    SAC C

DF2:

PlayerID    Name    Date    Started Opponent    GameStatus  Matchup
20000681    Aaron Gordon    4/1/2022    1   MIN     16
20002005    Aaron Holiday   4/1/2022    0   MEM     21
20002539    Aaron Nesmith   4/1/2022    0   IND     13
20002714    Aaron Wiggins   4/1/2022    1   DET     14
20002311    Admiral Schofield   4/1/2022    0   TOR     10
20000603    Al Horford  4/1/2022    1   IND     13
20002542    Aleksej Pokuševski 4/1/2022    1   DET     14
20000852    Alex Len    4/1/2022    1   HOU     22

CodePudding user response:

You need to specify the column you want to merge on using the on keyword argument:

new_df = pd.merge(stats_df, matchup_df[['PlayerID','Matchup','Started','GameStatus']], on=['PayerID'])

Otherwise it will merge using all of the shared columns.

Here is the explanation from the pandas docs:

on : label or list Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

  • Related