I have a huge If-Else code that I write in Tableau The compiler simply takes a lot of time to execute this code so I want to move it onto Python.
My df:
match_datetime country league home_team away_team predicted_home_score predicted_away_score predicted_total_score predicted_score_difference
38342 2021-09-15 09:30:00 Australia FFA Cup Edge Hill Gold Coast Knights 1.007927 1.920937 2.928864 0.913010
43807 2021-09-21 09:30:00 Australia FFA Cup Queensland Lions Casuarina 3.333684 0.761920 4.095605 2.571764
49031 2021-09-26 05:00:00 Australia FFA Cup Floreat Athena Adelaide United 0.688574 2.832026 3.520600 2.143452
53094 2021-09-29 10:00:00 Australia FFA Cup ECU Joondalup Adelaide Olympic 2.042965 1.688064 3.731028 0.354901
54080 2021-09-29 10:00:00 Australia FFA Cup ECU Joondalup Adelaide Olympic 1.803334 1.554651 3.357985 0.248683
I have a VLOOKUP table that Interprets these values to provide an output
df_list:
Country League Win DNB O 1.5 U 4.5
84 Australia A-League 1.45 1.45 3.60 2.2
85 Australia A-League Women 1.04 0.65 3.15 2.4
86 Australia Brisbane Premier League 1.04 0.65 3.10 2.4
87 Australia Capital Territory 1.04 0.65 3.10 2.4
88 Australia FFA Cup 1.49 1.49 3.58 2.4
My If-Else code simply put is:
IF df["country"] = df_list["Country"] AND df["league"] = df_list["League"] AND df["predicted_score_difference"] > df_list["Win"] AND df["predicted_total_score"] > df_list["O 1.5"]
THEN "W & O 1.5"
ELSEIF df["country"] = df_list["Country"] AND df["league"] = df_list["League"] AND df["predicted_score_difference"] > df_list["Win"]
THEN "W"
ELSEIF df["country"] = df_list["Country"] AND df["league"] = df_list["League"] AND df["predicted_total_score"] > df_list["O 1.5"]
THEN "O 1.5"
ELSEIF df["country"] = df_list["Country"] AND df["league"] = df_list["League"] AND df["predicted_score_difference"] > df_list["DNB"] AND df["predicted_score_difference"] < df_list["Win"] AND df["predicted_total_score"] > df_list["O 1.5"]
THEN "O 1.5 or DNB"
ELSEIF df["country"] = df_list["Country"] AND df["league"] = df_list["League"] AND df["predicted_score_difference"] > df_list["DNB"] AND df["predicted_score_difference"] < df_list["Win"]
THEN "DNB"
ELSEIF df["country"] = df_list["Country"] AND df["league"] = df_list["League"] AND df["predicted_score_difference"] > df_list["Win"] AND df["predicted_total_score"] < df_list["U 4.5"]
THEN "W & U 4.5"
ELSEIF df["country"] = df_list["Country"] AND df["league"] = df_list["League"] AND df["predicted_total_score"] < df_list["U 4.5"]
THEN "U 4.5"
ELSEIF df["country"] = df_list["Country"] AND df["league"] = df_list["League"] AND df["predicted_score_difference"] < df_list["DNB"]
THEN "N"
Where df_output is the resulting dataframe
e.g. for
match_datetime country league home_team away_team predicted_home_score predicted_away_score predicted_total_score predicted_score_difference
38342 2021-09-15 09:30:00 Australia FFA Cup Edge Hill Gold Coast Knights 1.007927 1.920937 2.928864 0.913010
The df_output["result"] would be "DNB"
How can I write the same code in Python to save time?
CodePudding user response:
Like this?
def func(row):
if row["predicted_score_difference"] > row["Win"] and row["predicted_total_score"] > row["O 1.5"]:
return "W & O 1.5"
if row["predicted_score_difference"] > row["Win"]:
return "W"
if row["predicted_total_score"] > row["O 1.5"]:
return "O 1.5"
if row["predicted_score_difference"] > row["DNB"] and row["predicted_score_difference"] < row["Win"] and row["predicted_total_score"] > row["O 1.5"]:
return "O 1.5 or DNB"
if row["predicted_score_difference"] > row["DNB"] and row["predicted_score_difference"] < row["Win"]:
return "DNB"
if row["predicted_score_difference"] > row["Win"] and row["predicted_total_score"] < row["U 4.5"]:
return "W & U 4.5"
if row["predicted_total_score"] < row["U 4.5"]:
return "U 4.5"
if row["predicted_score_difference"] < row["DNB"]:
return "N"
df = df.reset_index().merge(df_list, how="left", left_on=["country", "league"],right_on=["Country", "League"]).set_index('index')
df['result'] = df.apply(func,axis=1)
print(df)
output:
match_datetime country league home_team away_team predicted_home_score predicted_away_score ... Country League Win DNB O 1.5 U 4.5 result
index ...
38342 2021-09-15 09:30:00 Australia FFA Cup Edge Hill Gold Coast Knights 1.007927e 06 1.920937e 06 ... Australia FFA Cup 1.49 1.49 3.58 2.4 O 1.5
43807 2021-09-21 09:30:00 Australia FFA Cup Queensland Lions Casuarina 3.333684e 06 7.619200e-01 ... Australia FFA Cup 1.49 1.49 3.58 2.4 W & O 1.5
49031 2021-09-26 05:00:00 Australia FFA Cup Floreat Athena Adelaide United 6.885740e-01 2.832026e 06 ... Australia FFA Cup 1.49 1.49 3.58 2.4 W & O 1.5
53094 2021-09-29 10:00:00 Australia FFA Cup ECU Joondalup Adelaide Olympic 2.042965e 06 1.688064e 06 ... Australia FFA Cup 1.49 1.49 3.58 2.4 O 1.5
54080 2021-09-29 10:00:00 Australia FFA Cup ECU Joondalup Adelaide Olympic 1.803334e 06 1.554651e 06 ... Australia FFA Cup 1.49 1.49 3.58 2.4 O 1.5
CodePudding user response:
You could just make these conversions:
IF args
to if args:
,
ELSEIF args
to elif args:
,
AND
to and
,
THEN
to somevar =
,
arg1 = arg2
to arg1 == arg2
Essentially that would work, but your conditions could be heavily optimized.
PSD = 'predicted_score_difference'
PTS = 'predicted_total_score'
out = 'N'
if df["country"] == df_list["Country"] and df["league"] == df_list["League"]:
if df[PSD] > df_list["Win"]:
out = 'W'
elif df[PSD] > df_list["DNB"]:
out = 'DNB'
if df[PTS] > df_list["O 1.5"] or df[PTS] < df_list["U 4.5"]:
out = f'{out} O 1.5'
else:
out = f'{out} U 4.5'
print(out)
CodePudding user response:
if df["country"] == df_list["Country"] and df["league"] == df_list["League"] and df["predicted_score_difference"] > df_list["Win"] and df["predicted_total_score"] > df_list["O 1.5"]:
return "W & O 1.5"
elif df["country"] == df_list["Country"] and df["league"] == df_list["League"] and df["predicted_score_difference"] > df_list["Win"]:
return "W"
...
Basically '=' is converted to '==' in python and '>' and '<' remain the same. 'IF' becomes 'if' and 'ELSEIF' becomes 'elif'. 'AND' becomes 'and' and 'THEN' is replaced with indentation.