Home > database >  What would be the correct merge function in this pandas dataframe?
What would be the correct merge function in this pandas dataframe?

Time:05-08

df.head()
index       match_datetime country           league          home_team                  away_team  home_odds  draw_odds  away_odds  predicted_home_score  predicted_away_score  predicted_btts  predicted_class_btts  predicted_bttso2.5  predicted_class_bttso2.5  predicted_bttsu2.5  predicted_class_bttsu2.5  predicted_o2.5  predicted_class_o2.5  predicted_u2.5  predicted_class_u2.5  predicted_o1.5  predicted_class_o1.5  predicted_u1.5  predicted_class_u1.5                    run_time  predicted_o4  predicted_class_o4  predicted_u4  predicted_class_u4  predicted_draw  predicted_class_draw  predicted_total_score  predicted_score_difference   Win   DNB  O 1.5  U 4.5             datetime  home_score  away_score selection predicted_result result  total_score  score_difference result_match selection_match
0      0  2021-08-06 06:00:00  Russia  FNL 2 - Group 2          Yenisey 2          Lokomotiv-Kazanka       2.61       3.51       2.32                  1.45                  1.73            0.66                     1                0.56                         0                0.11                         0            0.62                     1            0.33                     0            0.76                     1            0.24                     0  2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.18                        0.28  3.98  2.06   4.06   1.80  2021-08-06 06:00:00         2.0         1.0         N             Away   Home          3.0               1.0     No Match         No Sel.
1      1  2021-08-06 07:00:00  Russia     Youth League           Ural U19  Krylya Sovetov Samara U19       2.10       3.68       2.98                  1.25                  1.33            0.44                     0                0.40                         0                0.04                         0            0.56                     1            0.47                     0            0.60                     1            0.35                     0  2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   2.58                        0.08  2.46  2.46   4.59   2.40  2021-08-06 07:00:00         1.0         1.0         N             Away   Draw          2.0               0.0     No Match         No Sel.
2      2  2021-08-06 08:00:00   World    Club Friendly             Alaves                    Al Nasr       1.94       3.63       3.42                  1.84                  1.39            0.68                     1                0.42                         1                0.19                         0            0.64                     1            0.42                     0            0.66                     1            0.34                     0  2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.23                        0.45  3.69  3.69   6.50   2.06                  NaN         NaN         NaN         N             Home    NaN          NaN               NaN    No Result         No Sel.
3      3  2021-08-06 08:30:00   China       Jia League  Chengdu Rongcheng          Shenyang Urban FC       1.16       6.87      15.29                  3.07                  0.52            0.50                     0                0.37                         0                0.04                         0            0.81                     1            0.26                     0            0.84                     1            0.10                     0  2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.59                        2.55  2.42  1.85   4.80   2.20  2021-08-06 08:30:00         2.0         0.0         W             Home   Home          2.0               2.0        Match           Match
4      4  2021-08-06 08:30:00   China     Super League           Wuhan FC       Tianjin Jinmen Tiger       1.56       3.69       6.71                  1.90                  0.68            0.50                     0                0.40                         0                0.21                         0            0.51                     0            0.58                     1            0.63                     1            0.37                     0  2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   2.58                        1.22  2.08  1.50   3.20   2.30  2021-08-06 08:30:00         1.0         1.0         N             Home   Draw          2.0               0.0     No Match         No Sel.

import pandas as pd
df['run_time'] = pd.to_datetime(df['run_time'])
df['match_date'] = pd.to_datetime(df['match_datetime']).dt.date
# Keeping the top 2 dates for every runtime to have a smaller dataframe
df_tt = df[['run_time', 'match_date']]
df_tt = df_tt.drop_duplicates()
df_tt = df_tt.sort_values(['run_time', 'match_date'], ascending=[True, True])
df_tt = df_tt.loc[df_tt.groupby(df['run_time']).head(2).index].sort_index()

df_tt.head()
                       run_time  match_date
0    2021-08-07 00:04:36.326391  2021-08-06
188  2021-08-07 00:04:36.326391  2021-08-05
758  2021-08-07 10:50:34.574040  2021-08-07
1407 2021-08-07 10:50:34.574040  2021-08-06
1884 2021-08-07 16:56:22.322338  2021-08-07

I want the resulting dataframe top be a filtered dataframe consisting of the values of df_tt

df_tab = df.reset_index().merge(df_tt, how="left", left_on=['run_time', 'match_date'],
                                right_on=['run_time', 'match_date']).set_index('index')

I am unsure if I am writing the merge function correctly however I am not getting the dataframe I want rather I am getting the original dataframe df as it is.

df_tab.head()

level_0       match_datetime country           league          home_team                  away_team  home_odds  draw_odds  away_odds  predicted_home_score  predicted_away_score  predicted_btts  predicted_class_btts  predicted_bttso2.5  predicted_class_bttso2.5  predicted_bttsu2.5  predicted_class_bttsu2.5  predicted_o2.5  predicted_class_o2.5  predicted_u2.5  predicted_class_u2.5  predicted_o1.5  predicted_class_o1.5  predicted_u1.5  predicted_class_u1.5                   run_time  predicted_o4  predicted_class_o4  predicted_u4  predicted_class_u4  predicted_draw  predicted_class_draw  predicted_total_score  predicted_score_difference   Win   DNB  O 1.5  U 4.5             datetime  home_score  away_score selection predicted_result result  total_score  score_difference result_match selection_match  match_date
index                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
0            0  2021-08-06 06:00:00  Russia  FNL 2 - Group 2          Yenisey 2          Lokomotiv-Kazanka       2.61       3.51       2.32                  1.45                  1.73            0.66                     1                0.56                         0                0.11                         0            0.62                     1            0.33                     0            0.76                     1            0.24                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.18                        0.28  3.98  2.06   4.06   1.80  2021-08-06 06:00:00         2.0         1.0         N             Away   Home          3.0               1.0     No Match         No Sel.  2021-08-06
1            1  2021-08-06 07:00:00  Russia     Youth League           Ural U19  Krylya Sovetov Samara U19       2.10       3.68       2.98                  1.25                  1.33            0.44                     0                0.40                         0                0.04                         0            0.56                     1            0.47                     0            0.60                     1            0.35                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   2.58                        0.08  2.46  2.46   4.59   2.40  2021-08-06 07:00:00         1.0         1.0         N             Away   Draw          2.0               0.0     No Match         No Sel.  2021-08-06
2            2  2021-08-06 08:00:00   World    Club Friendly             Alaves                    Al Nasr       1.94       3.63       3.42                  1.84                  1.39            0.68                     1                0.42                         1                0.19                         0            0.64                     1            0.42                     0            0.66                     1            0.34                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.23                        0.45  3.69  3.69   6.50   2.06                  NaN         NaN         NaN         N             Home    NaN          NaN               NaN    No Result         No Sel.  2021-08-06
3            3  2021-08-06 08:30:00   China       Jia League  Chengdu Rongcheng          Shenyang Urban FC       1.16       6.87      15.29                  3.07                  0.52            0.50                     0                0.37                         0                0.04                         0            0.81                     1            0.26                     0            0.84                     1            0.10                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.59                        2.55  2.42  1.85   4.80   2.20  2021-08-06 08:30:00         2.0         0.0         W             Home   Home          2.0               2.0        Match           Match  2021-08-06
4            4  2021-08-06 08:30:00   China     Super League           Wuhan FC       Tianjin Jinmen Tiger       1.56       3.69       6.71                  1.90                  0.68            0.50                     0                0.40                         0                0.21                         0            0.51                     0            0.58                     1            0.63                     1            0.37                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   2.58                        1.22  2.08  1.50   3.20   2.30  2021-08-06 08:30:00         1.0         1.0         N             Home   Draw          2.0               0.0     No Match         No Sel.  2021-08-06

How can I write the merge function correctly?

CodePudding user response:

If all columns in df_tt are used as key columns, no value/additional columns are appended to df.

If you try

df.reset_index().merge(df_tt, how="left", on='match_date').set_index('index')

you should be able to see run_time_x and run_time_y in result.

CodePudding user response:

This is what I tried and works.

import pandas as pd
df['run_time'] = pd.to_datetime(df['run_time'])
df['match_date'] = pd.to_datetime(df['match_datetime']).dt.date
# Keeping the top 2 dates for every runtime to have a smaller dataframe
df_tt = df[['run_time', 'match_date']]
df_tt = df_tt.drop_duplicates()
df_tt = df_tt.sort_values(['run_time', 'match_date'], ascending=[True, True])
df_tt = df_tt.loc[df_tt.groupby(df['run_time']).head(2).index].sort_index()

df_tt.head()
                       run_time  match_date
0    2021-08-07 00:04:36.326391  2021-08-06
188  2021-08-07 00:04:36.326391  2021-08-05
758  2021-08-07 10:50:34.574040  2021-08-07
1407 2021-08-07 10:50:34.574040  2021-08-06
1884 2021-08-07 16:56:22.322338  2021-08-07

I then concat the two columns as I was unsure how merge works with multiple columns

df['concat'] = df['run_time'].astype(str)   df['match_date'].astype(str)
df_tt['concat'] = df_tt['run_time'].astype(str)   df_tt['match_date'].astype(str)

Then I did a pd.merge

df_tab = df.merge(df_tt, left_on='concat', right_on='concat')

That resulted in a smaller and a filtered dataframe

 index       match_datetime country           league          home_team                  away_team  home_odds  draw_odds  away_odds  predicted_home_score  predicted_away_score  predicted_btts  predicted_class_btts  predicted_bttso2.5  predicted_class_bttso2.5  predicted_bttsu2.5  predicted_class_bttsu2.5  predicted_o2.5  predicted_class_o2.5  predicted_u2.5  predicted_class_u2.5  predicted_o1.5  predicted_class_o1.5  predicted_u1.5  predicted_class_u1.5                 run_time_x  predicted_o4  predicted_class_o4  predicted_u4  predicted_class_u4  predicted_draw  predicted_class_draw  predicted_total_score  predicted_score_difference   Win   DNB  O 1.5  U 4.5             datetime  home_score  away_score selection predicted_result result  total_score  score_difference result_match selection_match match_date_x                                concat                 run_time_y match_date_y
0      0  2021-08-06 06:00:00  Russia  FNL 2 - Group 2          Yenisey 2          Lokomotiv-Kazanka       2.61       3.51       2.32                  1.45                  1.73            0.66                     1                0.56                         0                0.11                         0            0.62                     1            0.33                     0            0.76                     1            0.24                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.18                        0.28  3.98  2.06   4.06   1.80  2021-08-06 06:00:00         2.0         1.0         N             Away   Home          3.0               1.0     No Match         No Sel.   2021-08-06  2021-08-07 00:04:36.3263912021-08-06 2021-08-07 00:04:36.326391   2021-08-06
1      1  2021-08-06 07:00:00  Russia     Youth League           Ural U19  Krylya Sovetov Samara U19       2.10       3.68       2.98                  1.25                  1.33            0.44                     0                0.40                         0                0.04                         0            0.56                     1            0.47                     0            0.60                     1            0.35                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   2.58                        0.08  2.46  2.46   4.59   2.40  2021-08-06 07:00:00         1.0         1.0         N             Away   Draw          2.0               0.0     No Match         No Sel.   2021-08-06  2021-08-07 00:04:36.3263912021-08-06 2021-08-07 00:04:36.326391   2021-08-06
2      2  2021-08-06 08:00:00   World    Club Friendly             Alaves                    Al Nasr       1.94       3.63       3.42                  1.84                  1.39            0.68                     1                0.42                         1                0.19                         0            0.64                     1            0.42                     0            0.66                     1            0.34                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.23                        0.45  3.69  3.69   6.50   2.06                  NaN         NaN         NaN         N             Home    NaN          NaN               NaN    No Result         No Sel.   2021-08-06  2021-08-07 00:04:36.3263912021-08-06 2021-08-07 00:04:36.326391   2021-08-06
3      3  2021-08-06 08:30:00   China       Jia League  Chengdu Rongcheng          Shenyang Urban FC       1.16       6.87      15.29                  3.07                  0.52            0.50                     0                0.37                         0                0.04                         0            0.81                     1            0.26                     0            0.84                     1            0.10                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   3.59                        2.55  2.42  1.85   4.80   2.20  2021-08-06 08:30:00         2.0         0.0         W             Home   Home          2.0               2.0        Match           Match   2021-08-06  2021-08-07 00:04:36.3263912021-08-06 2021-08-07 00:04:36.326391   2021-08-06
4      4  2021-08-06 08:30:00   China     Super League           Wuhan FC       Tianjin Jinmen Tiger       1.56       3.69       6.71                  1.90                  0.68            0.50                     0                0.40                         0                0.21                         0            0.51                     0            0.58                     1            0.63                     1            0.37                     0 2021-08-07 00:04:36.326391           NaN                 NaN           NaN                 NaN             NaN                   NaN                   2.58                        1.22  2.08  1.50   3.20   2.30  2021-08-06 08:30:00         1.0         1.0         N             Home   Draw          2.0               0.0     No Match         No Sel.   2021-08-06  2021-08-07 00:04:36.3263912021-08-06 2021-08-07 00:04:36.326391   2021-08-06

probably not an elegant solution but there can be a better way to approach this

  • Related