Home > Enterprise >  How to loop a pandas series(list type) and check if the string in list matches with another series i
How to loop a pandas series(list type) and check if the string in list matches with another series i

Time:08-16

I have two Data Frames:

**df1-- data1 = {'Company':['ADIDAS','NIKE','PUMA','NEW BALANCE','UNDER ARMOUR'],
   'Keywords':['COPA, PREDATOR, ORIGINALS, SPEEDFLOW','MERCURIAL, SUPERSTAR, VAPOR','ULTRA, FUTURE, RAPIDO','FURON','TEKELA'],
   'Suppliers':['', '', '[STADIUM, JD]', '', '']}


| Company    |Keywords                              | Suppliers   |
| ---------- |--------------------------------------|-------------|
| ADIDAS     |[COPA, PREDATOR, ORIGINALS, SPEEDFLOW]| <NA>        |
| NIKE       |[MERCURIAL, SUPERSTAR, VAPOR]         | <NA>        |
| PUMA       |[ULTRA, FUTURE, RAPIDO]               |[STADIUM, JD]|
|NEW BALANCE |[FURON]                               | <NA>        |
|UNDER ARMOUR|[TEKELA]                              | <NA>        |

**df2 --data2 = {'Names':['ADIDAS PREDATOR 17.1','NIKE MERCURIAL 2020','NIKE VAPOR 2021','NEW BALANCE FURON','PUMA RAPIDO 21.3', 'PUMA RAPIDO 21.4'],
   'Supplier':['ADIDAS','NIKE','NIKE','JD','STADIUM', 'JD'], 'Company.1': ['', '', '', '', '', '']}**
| Names                | Supplier | Company.1 |
| -------------------- | ---------|-----------|
| ADIDAS PREDATOR 17.1 |ADIDAS    | <NA>      |
| NIKE MERCURIAL 2020  |NIKE      | <NA>      |
| NIKE VAPOR 2021      |NIKE      | <NA>      |
| NEW BALANCE FURON    |JD        | <NA>      |
| PUMA RAPIDO 21.3     |STADIUM   | <NA>      |
| PUMA RAPIDO 21.4     |JD        | <NA>      |

Goal is to check if df2[Names] contains any word from df1[keywords], if YES then check if df1[suppliers] and df2[supplier] are equal, if YES then assign df2[Company.1] as df1[Company]. (If df1[Suppliers] is empty then no need to check for supplier)

Here is some code I've tried.(Print statement is just a reference for me)

for i in range(len(df1["Keywords"])):       
    for j in range(len(df1["Keywords"][i])):
        for name_index in range(len(df2["Product_name"])):
            if df1["Keywords"][i][j].strip() in df2["Product_name"][name_index]:
                print("YES "   df1["Keywords"][i][j]   " in "  df2["Product_name"][name_index])  

  # Now need to check if suppliers are same

Expected Output:

| Names                | Supplier | Company.1 |
| -------------------- | ---------|-----------|
| ADIDAS PREDATOR 17.1 |ADIDAS    | ADIDAS    |
| NIKE MERCURIAL 2020  |NIKE      | NIKE      |
| NIKE VAPOR 2021      |NIKE      | NIKE      |
| NEW BALANCE FURON    |JD        | NEW BALANCE|
| PUMA RAPIDO 21.3     |STADIUM   | PUMA      |

How to add Company name to Company.1 using the satisfied condition?

CodePudding user response:

Is Suppliers in data1 supposed to have <NA> values, except for [STADIUM, JD]? If so, I'm unsure how you've gotten the Company.1 values in your expected output. None of the values in data2's Supplier are <NA>, and — of the one that is not <NA> in data1 — the Keywords do not match the Names in data2.

Regardless, I believe I have the gist of what you're looking for.

keywords: str = "Keywords"
names: str = "Names"

# 1 - Compare the values of data2.Name to data1.Keywords
data1[keywords] = [i.split(", ") for i in data1.get(keywords)]
data2[names] = [i.split() for i in data2.get(names)]
data2["_match_keywords"] = [any(i in name for i in keyword) for name, keyword in zip(data2.get(names), data1.get(keywords))]

# Out - data2
# {'Names': [['ADIDAS', 'PREDATOR', '17.1'], ['NIKE', 'MERCURIAL', '2020'], #['NIKE', 'VAPOR', '2021'],
#           ['NEW', 'BALANCE', 'FURON'], ['PUMA', 'RAPIDO', '21.3']],
# 'Supplier': ['ADIDAS', 'NIKE', 'NIKE', 'JD', 'STADIUM'], 'Company.1': ['', # '', '', '', ''],
# '_match_keywords': [True, True, False, True, False]}

# 2 - Compare data2.Supplier to data1.Suppliers
data2["_match_supplier"] = [any(i in s1 for i in s2) for s2, s1 in zip(data2.get("Supplier"), data1.get("Suppliers"))]

# Out
# {'Names': [['ADIDAS', 'PREDATOR', '17.1'], ['NIKE', 'MERCURIAL', '2020'], ['NIKE', 'VAPOR', '2021'],
#            ['NEW', 'BALANCE', 'FURON'], ['PUMA', 'RAPIDO', '21.3']],
#  'Supplier': ['ADIDAS', 'NIKE', 'NIKE', 'JD', 'STADIUM'], 'Company.1': ['', '', '', '', ''],
#  '_match_keywords': [True, True, False, True, False], '_match_supplier': [False, False, True, False, False]}

# 3 - If Keyword and Supplier are equal, then assign data1.Company to data2.Company.1
for match, org in zip(data2.get("_match_keywords"), data2.get("_match_supplier")):
    if match == org:
        data2["Company.1"] = org

# 4 - Make the frames and drop the helper columns (_match_keywords, _match_supplier)
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df2 = df2[[col for col in df2.columns if not col.startswith("_")]]

df1

Company Keywords Suppliers
ADIDAS ['COPA', 'PREDATOR', 'ORIGINALS', 'SPEEDFLOW']
NIKE ['MERCURIAL', 'SUPERSTAR', 'VAPOR']
PUMA ['ULTRA', 'FUTURE', 'RAPIDO'] [STADIUM, JD]
NEW BALANCE ['FURON']
UNDER ARMOUR ['TEKELA']

df2 w/conditions

Names Supplier Company.1 _match_keywords _match_supplier
['ADIDAS', 'PREDATOR', '17.1'] ADIDAS True False
['NIKE', 'MERCURIAL', '2020'] NIKE True False
['NIKE', 'VAPOR', '2021'] NIKE False True
['NEW', 'BALANCE', 'FURON'] JD True False
['PUMA', 'RAPIDO', '21.3'] STADIUM False False

df2 w/helpers dropped

| Names                          | Supplier   | Company.1   |
|--------------------------------|------------|-------------|
| ['ADIDAS', 'PREDATOR', '17.1'] | ADIDAS     |             |
| ['NIKE', 'MERCURIAL', '2020']  | NIKE       |             |
| ['NIKE', 'VAPOR', '2021']      | NIKE       |             |
| ['NEW', 'BALANCE', 'FURON']    | JD         |             |
| ['PUMA', 'RAPIDO', '21.3']     | STADIUM    |             |

CodePudding user response:

Here is the code that is working for me. If anybody has any suggestion on improving my code, I would appreciate it!

for i in range(len(df1["Keywords"])):       
  for j in range(len(df1["Keywords"][i])):
    for name_index in range(len(df2["Product_name"])):
        if df1["Keywords"][i][j].strip() in df2["Product_name"][name_index]:
            print("YES,"   df1["Keywords"][i][j]   " in "  df2["Product_name"][name_index])       

            print("---Checking Supplier-------")
            df1["Suppliers"].fillna("Empty", inplace = True)
            if df1["Suppliers"][i] == "Empty":
                print("---Supplier empty so addding brand name")
                df2["Company.1"][name_index] = df1["Company"][i]
                print("---Brand added--")
            else:
                print("---Supplier not empty so looking for match")
                for suppliers in df1["Suppliers"][i]:                               
                    if df2["Supplier"][name_index] in suppliers:
                        print("Supplier matched", end =" ")
                        print(df2["Supplier"][name_index], suppliers)
                        df2["Company.1"][name_index] = df1["Company"][i]
                        print("Brand added")
                        break                                                 
                    else:
                        df2["Company.1"][name_index] = ("Unmapped")
                        print("Supplier not matched so unmapped.", end = " ")
                        print(df2["Supplier"][name_index], suppliers)
  • Related