How can I extract from a column the word that matches a list?-CodePudding

I have a DataFrame like the following one, I intend to extract a part from the Title column if it matches a list and place the result in another column.

The list would be comprised of strings like the following:

mi_lista = ['Automata', 'Pearls', 'Deep learning', 'Patterns']

title	Author
Introduction to Automata Theory	John E. Hopcroft
Programming Pearls	Jon L. Bentley
Deep Learning	Ian Goodfellow
Patterns of Enterprise Application Architecture	Martin Fowler
Deep Learning with Python	John E. Hopcroft
Theory Of Self Reproducing Automata	Jon L. Bentley
Enterprise Integration Patterns	Ian Goodfellow
Deep Learning: A Practitioner's Approach	Martin Fowler

And this results in a new column in the DataFrame like the following:

title	list	author
Introduction to Automata Theory	Automata	John E. Hopcroft
Programming Pearls	Pearls	Jon L. Bentley
Deep Learning	Deep Learning	Ian Goodfellow
Patterns of Enterprise Application Architecture	Patterns	Martin Fowler
Deep Learning with Python	Deep Learning	John E. Hopcroft
Theory Of Self Reproducing Automata	Automata	Jon L. Bentley
Enterprise Integration Patterns	Patterns	John E. Hopcroft
Deep Learning: A Practitioner's Approach	Deep Learning	Martin Fowler

I tried this:

df.insert(1, "list", df['title'].apply(lambda a: ','.join([l for l in temas if l in a.split()])), True)

But I have two issues with this, first it gives all occurrencies of the items I'm searching:

title	list	author
Introduction to Patterns and Automata Theory	Patterns, Automata	John E. Hopcroft

And I just want the first occurence of the searched item.

Secondly, if the items I'm searching is at the end of the string it doesn't take it into account, so, for example, the title Deep Learning return and empty result:

title	list	author
Introduction to Automata Theory	Automata	John E. Hopcroft
Programming Pearls	Pearls	Jon L. Bentley
Deep Learning		Ian Goodfellow

Thanks for the help!

CodePudding user response：

You should wrap your list comprehension in set to just get the unique matches

df.insert(1, "list", df['title'].apply(lambda a: ', '.join(set([l for l in mi_lista if l in a]))), True)