TypeError: expected string or bytes-like object for list in list-CodePudding

I have a list in a column of a data frame: for example: emaildf['email'][0] = ["[email protected]","[email protected]","[email protected]" ...]

I want to iterate over each row (lets assume i) and match if the an object (assume j) in i contains a substring: for example:

for i in emaildf['email']:
    for j in i:
         do_something:

Here is my code:

Private_Email = []
for index,row in emaildf.iterrows():
    for i in row['email']:
        if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0:
            Private_Email.append(row['email'])
        else:
            Private_Email.append('No Gmail/Hotmail/MSN/Yahoo domains found.')
emaildf['Private_Email'] = Private_Email

This is the error i'm getting:

----> 4 if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0: TypeError: expected string or bytes-like object

Note: Input:

re.findall("gmail|hotmail|yahoo|msn", "[email protected]")

Output:

['gmail']

So that's why I'm checking for the length of the list.

CodePudding user response：

You're getting TypeError:

----> 4         if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0:

here because row['email'] is a list, not a string, so you can't apply re.findall which expected a string, not a list.

Now, it seems your particular problem can be solved without even iterating over dataframe rows. Try:

emails = emaildf['email'].explode()
emails = pd.Series(np.where(emails.str.contains("gmail|hotmail|yahoo|msn").replace(np.nan, False), emails, np.nan), index=emails.index)
emails = emails.groupby(emails.index).apply(lambda x: [y for y in x if pd.notna(y)]).apply(lambda x: x if len(x)>1 else (x[0] if len(x)==1 else np.nan))
df['Private_Email'] = np.where(pd.notna(emails), emails, 'No Gmail/Hotmail/MSN/Yahoo domains found.')