Home > front end >  How can I extract from a column the word that matches a list?
How can I extract from a column the word that matches a list?

Time:10-21

I have a DataFrame like the following one, I intend to extract a part from the Title column if it matches a list and place the result in another column.

The list would be comprised of strings like the following:

mi_lista = ['Automata', 'Pearls', 'Deep learning', 'Patterns']
title Author
Introduction to Automata Theory John E. Hopcroft
Programming Pearls Jon L. Bentley
Deep Learning Ian Goodfellow
Patterns of Enterprise Application Architecture Martin Fowler
Deep Learning with Python John E. Hopcroft
Theory Of Self Reproducing Automata Jon L. Bentley
Enterprise Integration Patterns Ian Goodfellow
Deep Learning: A Practitioner's Approach Martin Fowler

And this results in a new column in the DataFrame like the following:

title list author
Introduction to Automata Theory Automata John E. Hopcroft
Programming Pearls Pearls Jon L. Bentley
Deep Learning Deep Learning Ian Goodfellow
Patterns of Enterprise Application Architecture Patterns Martin Fowler
Deep Learning with Python Deep Learning John E. Hopcroft
Theory Of Self Reproducing Automata Automata Jon L. Bentley
Enterprise Integration Patterns Patterns John E. Hopcroft
Deep Learning: A Practitioner's Approach Deep Learning Martin Fowler

I tried this:

df.insert(1, "list", df['title'].apply(lambda a: ','.join([l for l in temas if l in a.split()])), True)

But I have two issues with this, first it gives all occurrencies of the items I'm searching:

title list author
Introduction to Patterns and Automata Theory Patterns, Automata John E. Hopcroft

And I just want the first occurence of the searched item.

Secondly, if the items I'm searching is at the end of the string it doesn't take it into account, so, for example, the title Deep Learning return and empty result:

title list author
Introduction to Automata Theory Automata John E. Hopcroft
Programming Pearls Pearls Jon L. Bentley
Deep Learning Ian Goodfellow

Thanks for the help!

CodePudding user response:

You should wrap your list comprehension in set to just get the unique matches

df.insert(1, "list", df['title'].apply(lambda a: ', '.join(set([l for l in mi_lista if l in a]))), True)
  • Related