Home > OS >  filtering a column list from a term starting from the back
filtering a column list from a term starting from the back

Time:10-19

I have this dataset:

emails1 = ['[email protected]', "[email protected]", "[email protected]"]
emails2 = ['[email protected]', "[email protected]", '[email protected]', "[email protected]"]
emails3 =  ["[email protected]", '[email protected]']

terms = ['@gmail.com', 'data', 'ddd@']

df = pd.DataFrame([emails1, emails2, emails3])

df["emails"] = df.apply(lambda x: list([x[0],
                        x[1],
                        x[2],
                        x[3]]),axis=1)

df = df.iloc[: , 4:]
df
    emails
0   [[email protected], [email protected], [email protected], None]
1   [[email protected], [email protected], [email protected], [email protected]]
2   [[email protected], [email protected], None, None]

I need to be able to find the first item of each list (starting from the back) that is from the terms array, so my out put wold be another column:

    emails                                                             email wanted
0   [[email protected], [email protected], [email protected], None]            [[email protected]]
1   [[email protected], [email protected], [email protected], [email protected]]     [[email protected]]
2   [[email protected], [email protected], None, None]                         [[email protected]]

I tried this for each of the terms and combined the result, but does not work:

df["emails"].apply(lambda x:[i for i in x if '@gmail.com' in i])

Is there a good way of doing this?

CodePudding user response:

The exact logic is unclear, but you need a list comprehension:

import re
regex = re.compile('|'.join(map(re.escape, terms)))
# r'@gmail\.com|data|ddd@'

df['wanted'] = [next((x for x in l[::-1] if x and regex.search(x)), None)
                for l in df['emails']]

output:

                                              emails           wanted
0  [[email protected], [email protected], [email protected]...  [email protected]
1  [[email protected], [email protected], [email protected]...    [email protected]
2         [[email protected], [email protected], None, None]    [email protected]
  • Related