I have a DataFrame like the following one, I intend to extract a part from the Title column if it matches a list and place the result in another column.
The list would be comprised of strings like the following:
mi_lista = ['Automata', 'Pearls', 'Deep learning', 'Patterns']
title | Author |
---|---|
Introduction to Automata Theory | John E. Hopcroft |
Programming Pearls | Jon L. Bentley |
Deep Learning | Ian Goodfellow |
Patterns of Enterprise Application Architecture | Martin Fowler |
Deep Learning with Python | John E. Hopcroft |
Theory Of Self Reproducing Automata | Jon L. Bentley |
Enterprise Integration Patterns | Ian Goodfellow |
Deep Learning: A Practitioner's Approach | Martin Fowler |
And this results in a new column in the DataFrame like the following:
title | list | author |
---|---|---|
Introduction to Automata Theory | Automata | John E. Hopcroft |
Programming Pearls | Pearls | Jon L. Bentley |
Deep Learning | Deep Learning | Ian Goodfellow |
Patterns of Enterprise Application Architecture | Patterns | Martin Fowler |
Deep Learning with Python | Deep Learning | John E. Hopcroft |
Theory Of Self Reproducing Automata | Automata | Jon L. Bentley |
Enterprise Integration Patterns | Patterns | John E. Hopcroft |
Deep Learning: A Practitioner's Approach | Deep Learning | Martin Fowler |
I tried this:
df.insert(1, "list", df['title'].apply(lambda a: ','.join([l for l in temas if l in a.split()])), True)
But I have two issues with this, first it gives all occurrencies of the items I'm searching:
title | list | author |
---|---|---|
Introduction to Patterns and Automata Theory | Patterns, Automata | John E. Hopcroft |
And I just want the first occurence of the searched item.
Secondly, if the items I'm searching is at the end of the string it doesn't take it into account, so, for example, the title Deep Learning return and empty result:
title | list | author |
---|---|---|
Introduction to Automata Theory | Automata | John E. Hopcroft |
Programming Pearls | Pearls | Jon L. Bentley |
Deep Learning | Ian Goodfellow |
Thanks for the help!
CodePudding user response:
You should wrap your list comprehension in set
to just get the unique matches
df.insert(1, "list", df['title'].apply(lambda a: ', '.join(set([l for l in mi_lista if l in a]))), True)