I have a list in a column of a data frame: for example: emaildf['email'][0] = ["[email protected]","[email protected]","[email protected]" ...]
I want to iterate over each row (lets assume i) and match if the an object (assume j) in i contains a substring: for example:
for i in emaildf['email']:
for j in i:
do_something:
Here is my code:
Private_Email = []
for index,row in emaildf.iterrows():
for i in row['email']:
if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0:
Private_Email.append(row['email'])
else:
Private_Email.append('No Gmail/Hotmail/MSN/Yahoo domains found.')
emaildf['Private_Email'] = Private_Email
This is the error i'm getting:
----> 4 if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0: TypeError: expected string or bytes-like object
Note: Input:
re.findall("gmail|hotmail|yahoo|msn", "[email protected]")
Output:
['gmail']
So that's why I'm checking for the length of the list.
CodePudding user response:
You're getting TypeError:
----> 4 if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0:
here because row['email']
is a list, not a string, so you can't apply re.findall
which expected a string, not a list.
Now, it seems your particular problem can be solved without even iterating over dataframe rows. Try:
emails = emaildf['email'].explode()
emails = pd.Series(np.where(emails.str.contains("gmail|hotmail|yahoo|msn").replace(np.nan, False), emails, np.nan), index=emails.index)
emails = emails.groupby(emails.index).apply(lambda x: [y for y in x if pd.notna(y)]).apply(lambda x: x if len(x)>1 else (x[0] if len(x)==1 else np.nan))
df['Private_Email'] = np.where(pd.notna(emails), emails, 'No Gmail/Hotmail/MSN/Yahoo domains found.')