Home > Blockchain >  Pandas cells with multiple values
Pandas cells with multiple values

Time:12-24

I have this problem. I have a set of premium locations. I need to apply the following logic. Premium Locations = Beirut, Saida

  • If premium location and business score 0.75 then priority =1
  • If premium location and business score 0.5 then priority =2
  • If there is no premium location the priority =3
  • Location name should be exactly one of the premium locations. for example BeirutX should not be considered.
Input example:
Business  Location           BusScore
    X.    Beirut, Aley        0.75
    Y.    Saida, Sour          0.5
    Z.    Baalbeck,Tripoli    0.75
    D.    Tripoli.            0.75
Desired Output:
 Business  Location.        BusScore       Priority
    X.    Beirut, Aley.      0.75              1
    Y.    Saida, Sour.        0.5              2
    Z.    Baalbeck,Tripoli   0.75             3
    D.    Tripoli.           0.75            3

CodePudding user response:

You have to make sure that your location column is a list of str. Why are some entries split by a comma and a space, and others by only a comma? Why do some locations end with a .? Make sure to remove those first. You can then define a function that describes your priority logic and apply it to each row:

import pandas as pd

df = pd.DataFrame([
    ["X.", "Beirut, Aley", 0.75],
    ["Y.", "Saida, Sour", 0.5],
    ["Z.", "Baalbeck, Tripoli", 0.75],
    ["D.", "Tripoli", 0.75]
], columns=["Business", "Location", "BusScore"])

# IMPORTANT: You have to change the line below properly depending on the formatting of your location column.
df["Location"] = df["Location"].str.split(", ")

# Actual logic you can use:
def premium_location(location):
    return location in {"Beirut", "Saida"}

def priority(business):
    premium = any(premium_location(location) for location in business["Location"])
    if premium and business["BusScore"] == 0.75:
        return 1

    if premium and business["BusScore"] == 0.5:
        return 2

    if not premium:
        return 3

df["priority"] = df.apply(priority, axis=1)

Output:

  Business             Location  BusScore  priority
0       X.       [Beirut, Aley]      0.75         1
1       Y.        [Saida, Sour]      0.50         2
2       Z.  [Baalbeck, Tripoli]      0.75         3
3       D.            [Tripoli]      0.75         3
  • Related