I have two datframes that have different numbers of rows and well as different numbers of columns.
row_List1:
date team_home team_away goals_home goals_away shootout_win competition
1 2018-06-04 India Kenya 3 0 NaN Friendly 2018
2 2018-06-06 Armenia Moldova 0 0 NaN Friendly 2018
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018
row_List2:
date team_home team_away goals_home goals_away shootout_win competition venue
1 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home
2 2018-06-05 USA Pakistan 8 5 NaN Friendly 2018 Nuetral
3 2018-06-06 Moldova Armenia 0 0 NaN Friendly 2018 Away
4 2018-06-07 India Srilanka 2 0 NaN Friendly 2018 Home
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018 Away
6 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home
So row_List2 has more columns and more rows than row_List1.
row_List2 has venues of all matches. I need the to add a column venues in row_List1 and check for a match in row_List1, if it exists in row_List2, I need to extract the venue and add to the new column in row_List1.
I tried the below code:
# row_list1['venue'] = np.where(((row_list1['date'] == row_list2['date']) and (row_list1['team_home'] == row_list2['team_home'] or row_list1['team_home'] == row_list2['team_away']) and (row_list1['team_away'] == row_list2['team_away'] or row_list1['team_away'] == row_list2['team_home']) and (row_list1['goals_home'] == row_list2['goals_home'] or row_list1['goals_home'] == row_list2['goals_away']) and (row_list1['goals_away'] == row_list2['goals_away'] or row_list1['goals_away'] == row_list2['goals_home'])), row_list2['venue'], np.NaN)
These are the conditions I need but the above code gives me an error:
ValueError: Can only compare identically-labeled Series objects
Now one more problem is that the team_home and team_away may be switched in row_List2. So I need to check:
if row_list1['team_home'] == row_list2['team_home'] or row_list1['team_home'] == row_list2['team_away']) and (row_list1['team_away'] == row_list2['team_away'] or row_list1['team_away'] == row_list2['team_home']) and (row_list1['goals_home'] == row_list2['goals_home'] or row_list1['goals_home'] == row_list2['goals_away']) and (row_list1['goals_away'] == row_list2['goals_away'] or row_list1['goals_away'] == row_list2['goals_home'])
What I want as an output is:
row_List1:
date team_home team_away goals_home goals_away shootout_win competition venue
1 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home
2 2018-06-06 Armenia Moldova 0 0 NaN Friendly 2018 Away
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018 Away
Can Anyone please help?
CodePudding user response:
This is kind of hacky but it works. Note that the Armenia-Moldova games don't actually match in your dataframes (they're flipped home/away). I had to .fillna()
before performing the comparison because np.nan
doesn't == np.nan
.
>>> for df in [df1, df2]:
... df.fillna(0, inplace=True)
>>> df1[[df2.drop('venue', axis=1).eq(r).all(axis=1).any() for r in df1.itertuples(index=False)]]
date team_home team_away goals_home goals_away shootout_win competition year
0 2018-06-04 India Kenya 3 0 0.0 Friendly 2018
2 2018-06-09 Italy Netherlands 1 1 0.0 Friendly 2018
CodePudding user response:
Is this what you are looking for?
df = pd.merge(row_List1,row_List2.drop_duplicates(),how = 'left')
Output:
date team_home team_away goals_home goals_away shootout_win \
0 6/4/2018 India Kenya 3 0 NaN
1 6/6/2018 Armenia Moldova 0 0 NaN
2 6/9/2018 Italy Netherlands 1 1 NaN
competition year venue
0 Friendly 2018 Home
1 Friendly 2018 NaN
2 Friendly 2018 Away