I'm using Python 3.9.12 to do some data manipulation. I have a dataframe, df
that looks like this:
name1 name2 region_name
0 Salo saloo UPPER SINDAN
1 rnelih NaN RIMAN TRIST
2 benini NaN Lower Tangi
... ... ... ...
999 kremith kremithin MIKLAR
1000 Riro rirron LOWER BASTI
And a list of tuples ls
that is a returned result by cursor.fetchall()
from db and it looks like this:
[(47, 'Upper Sindan', ''),
(48, 'Riman Tah', 'Riman Trist'),
(27, 'Timbari', 'Timbarlin'),
(768, 'MIKLAR', ''),
(769, 'Dindan', ''),
(770, 'Shina Hardi',''),
(..., '.........','...'),
(921, 'KIMAN DARIB', 'lower basti')]
My goal is to loop through the df
and compare every single value of column region_name
with the second and third value of the tuple in the entire list ls
. If they are equal (case insensitive) then fetch the corresponding number from the first value of the tuple in the list and create a new colmun region_code
in df with fetch value that corrosponds to the matched region name. For non matched put empy value and if it could not find any match in the entire list then don't create the column region_code
.
I have tried with below blocks of code but all of them failed:
for item in ls:
if df['region_name'].str.contains(item[1]).any() or
df['region_name'].str.contains(item[2]).any():
df['region_code']=item[0]
else:
df['region_code']=""
#Here I get empty region_code in the df
for index, row1 in df.iterrows():
for item in ls:
if (row1[2] == item[1]) or (row1[2] == item[2]):
df["region_code"] = item[0]
else:
df["region_code"] = ""
#Here I get empty region_code in the df
And for this one:
for index, row1 in df.iterrows():
for item in ls:
if (row1[2].str.lower() == item[1].str.lower()) or (row1[2].str.lower() == item[1].str.lower()):
df["region_code"] = item [0]
else:
df["region_code"] = ""
#Here I get: AttributeError: 'str' object has no attribute 'str'
I really appreciate if someone can point out what I'm doing wrong or a better way to do this? Thank you for your help!
CodePudding user response:
Use Series.map
with dictionary of lowercase values by DataFrame.melt
:
L = [(47, 'Upper Sindan', ''),
(48, 'Riman Tah', 'Riman Trist'),
(27, 'Timbari', 'Timbarlin'),
(768, 'MIKLAR', ''),
(769, 'Dindan', ''),
(770, 'Shina Hardi',''),
(921, 'KIMAN DARIB', 'lower basti')]
d = pd.DataFrame(L).melt(0).set_index('value')[0].rename(str.lower).to_dict()
print (d)
{'upper sindan': 47, 'riman tah': 48, 'timbari': 27, 'miklar': 768, 'dindan': 769,
'shina hardi': 770, 'kiman darib': 921, '': 770, 'riman trist': 48,
'timbarlin': 27, 'lower basti': 921}
And then:
df['region_code'] = df['region_name'].str.lower().map(d).fillna('')