Home > Enterprise >  Nested loop throuh a dataframe and compare it's string column values with a list of string tupl
Nested loop throuh a dataframe and compare it's string column values with a list of string tupl

Time:07-23

I'm using Python 3.9.12 to do some data manipulation. I have a dataframe, df that looks like this:

      name1      name2           region_name
0     Salo       saloo           UPPER SINDAN
1     rnelih     NaN             RIMAN TRIST
2     benini     NaN             Lower Tangi
...  ...         ...             ...
999   kremith    kremithin       MIKLAR
1000  Riro       rirron          LOWER BASTI

And a list of tuples ls that is a returned result by cursor.fetchall() from db and it looks like this:

[(47, 'Upper Sindan', ''),
 (48, 'Riman Tah', 'Riman Trist'),
 (27, 'Timbari', 'Timbarlin'),
 (768, 'MIKLAR', ''),
 (769, 'Dindan', ''),
 (770, 'Shina Hardi',''),
 (...,  '.........','...'),
 (921, 'KIMAN DARIB', 'lower basti')]

My goal is to loop through the df and compare every single value of column region_name with the second and third value of the tuple in the entire list ls. If they are equal (case insensitive) then fetch the corresponding number from the first value of the tuple in the list and create a new colmun region_code in df with fetch value that corrosponds to the matched region name. For non matched put empy value and if it could not find any match in the entire list then don't create the column region_code.

I have tried with below blocks of code but all of them failed:

 for item in ls:
        if df['region_name'].str.contains(item[1]).any() or 
     df['region_name'].str.contains(item[2]).any():
            df['region_code']=item[0]
        else:
            df['region_code']=""

#Here I get empty region_code in the df

for index, row1 in df.iterrows():
    for item in ls:
        if (row1[2] == item[1]) or (row1[2] == item[2]):
            df["region_code"] = item[0]
        else:
            df["region_code"] = ""

#Here I get empty region_code in the df

And for this one:

for index, row1 in df.iterrows():
    for item in ls:
        if (row1[2].str.lower() == item[1].str.lower()) or (row1[2].str.lower() == item[1].str.lower()):
            df["region_code"] = item [0]
        else:
            df["region_code"] = ""

#Here I get: AttributeError: 'str' object has no attribute 'str'

I really appreciate if someone can point out what I'm doing wrong or a better way to do this? Thank you for your help!

CodePudding user response:

Use Series.map with dictionary of lowercase values by DataFrame.melt:

L = [(47, 'Upper Sindan', ''),
 (48, 'Riman Tah', 'Riman Trist'),
 (27, 'Timbari', 'Timbarlin'),
 (768, 'MIKLAR', ''),
 (769, 'Dindan', ''),
 (770, 'Shina Hardi',''),
 (921, 'KIMAN DARIB', 'lower basti')]

d = pd.DataFrame(L).melt(0).set_index('value')[0].rename(str.lower).to_dict()
print (d)
{'upper sindan': 47, 'riman tah': 48, 'timbari': 27, 'miklar': 768, 'dindan': 769, 
 'shina hardi': 770, 'kiman darib': 921, '': 770, 'riman trist': 48,
 'timbarlin': 27, 'lower basti': 921}

And then:

df['region_code'] = df['region_name'].str.lower().map(d).fillna('')
  • Related