I have this problem. I have a set of premium locations. I need to apply the following logic. Premium Locations = Beirut, Saida
- If premium location and business score 0.75 then priority =1
- If premium location and business score 0.5 then priority =2
- If there is no premium location the priority =3
- Location name should be exactly one of the premium locations. for example BeirutX should not be considered.
Input example:
Business Location BusScore
X. Beirut, Aley 0.75
Y. Saida, Sour 0.5
Z. Baalbeck,Tripoli 0.75
D. Tripoli. 0.75
Desired Output:
Business Location. BusScore Priority
X. Beirut, Aley. 0.75 1
Y. Saida, Sour. 0.5 2
Z. Baalbeck,Tripoli 0.75 3
D. Tripoli. 0.75 3
CodePudding user response:
You have to make sure that your location column is a list
of str
. Why are some entries split by a comma and a space, and others by only a comma? Why do some locations end with a .
? Make sure to remove those first. You can then define a function that describes your priority logic and apply it to each row:
import pandas as pd
df = pd.DataFrame([
["X.", "Beirut, Aley", 0.75],
["Y.", "Saida, Sour", 0.5],
["Z.", "Baalbeck, Tripoli", 0.75],
["D.", "Tripoli", 0.75]
], columns=["Business", "Location", "BusScore"])
# IMPORTANT: You have to change the line below properly depending on the formatting of your location column.
df["Location"] = df["Location"].str.split(", ")
# Actual logic you can use:
def premium_location(location):
return location in {"Beirut", "Saida"}
def priority(business):
premium = any(premium_location(location) for location in business["Location"])
if premium and business["BusScore"] == 0.75:
return 1
if premium and business["BusScore"] == 0.5:
return 2
if not premium:
return 3
df["priority"] = df.apply(priority, axis=1)
Output:
Business Location BusScore priority
0 X. [Beirut, Aley] 0.75 1
1 Y. [Saida, Sour] 0.50 2
2 Z. [Baalbeck, Tripoli] 0.75 3
3 D. [Tripoli] 0.75 3