So I have the following two data sets:
df1:
index | datetime | team | other conditions |
---|---|---|---|
1 | 2013-4-1 | team a | 73.3 |
2 | 2013-4-1 | team b | 34.5 |
3 | 2014-3-12 | team c | 57.2 |
df2:
index | datetime | home | away | other conditions |
---|---|---|---|---|
1 | 2013-4-1 | team a | team c | 73.3 |
2 | 2013-4-1 | team d | team e | 73.3 |
3 | 2014-3-12 | team f | team a | 348.2 |
now I need to merge these two in the following way:
for df1 find the row in df2 in which df1['datetime']==df2['datetime'] AND df1['team']== df2['home'] OR df2['away']
then append the row from df1 to the matching row in df 2
df2 is larger than df1 so if all rows are matched there should be a view empty ones remaining which isn't a problem.
big thanks ahead for any useful tips otherwise I have to find the matching ones by hand for >4000 rows
CodePudding user response:
Without seeing more of your data this will give you the results you are looking for
data = {
'datetime' : ['2013-4-1', '2013-4-1', '2014-3-12', '2014-4-12'],
'team' : ['team a', 'team b', 'team c', 'team b'],
'other conditions' : [73.3, 34.5, 57.2, 384.3]
}
data2 = {
'datetime' : ['2013-4-1', '2013-4-1', '2014-3-12', '2014-4-12'],
'home' : ['team a', 'team d', 'team f', 'team g'],
'away' : ['team c', 'team e', 'team a', 'team b'],
'other conditions' : [73.3, 34.5, 57.2, 38.3]
}
df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df3 = df2.merge(df1, how='left', on = 'datetime', suffixes = ('','_y'))
comp_column = np.where((df3['team'] == df3['home']) | (df3['team'] == df3['away']), True, False)
df3['check'] = comp_column
df3 = df3.loc[df3['check'] == True]
df3.drop((['other conditions_y', 'check']), axis = 1, inplace = True)
df3
This will join your df1 and df2 together on the datetime which might create some bad joins, but if you are only looking for where team is either in home or away the np.where at the bottom would resolve this for you.
CodePudding user response:
df1['home']=df1['team']
df1['away']=df1['team']
dfhome = df1.merge(df2, on=['datex', 'home'], how='inner')
dfaway = df1.merge(df2, on=['datex', 'away'], how='inner')
then put together dfhome and dfaway.