Home > front end >  How to retrieve all the elements from a string that are present in a list
How to retrieve all the elements from a string that are present in a list

Time:01-14

I have the following list and a DataFrame:

the_list = ["one", "et", "allu", "Metall", "54ro", 'al89']

df = pd.DataFrame({ 'ID':[100, 200, 300, 400],
                   'String':['Jonel-al89 (et)', 'Stel-00(et) al89 x 57-mm', 'Metall,   54ro', "allu, Metall9(lop)"]
                  })

What I need is to make a new column where I would get all the elements from the list that are present in each string in the "String" column. So the output should be looking like that:

ID String Desired_Column
100 Jonel-al89 (et) one, al89, et
200 Stel-00(et) al89 x 57-mm et, al89
300 Metall, 54ro et, Metall, 54ro
400 allu, Metall9(lop) allu, et, Metall

What would be the way to achieve it?
Any help would be much appreciated!

CodePudding user response:

You don't even need regex if you use a list comprehension which checks for the presence of the elements from your list in the String column.

I'm not sure you want the elements as a list or as string, if you want a string put a str.join around the comprehension.

import pandas as pd

the_list = ["one", "et", "allu", "Metall", "54ro", 'al89']

df = pd.DataFrame({ 'ID':[100, 200, 300, 400],
                   'String':['Jonel-al89 (et)', 'Stel-00(et) al89 x 57-mm', 'Metall,   54ro', "allu, Metall9(lop)"]
                  })

df["Desired_Column"] = df["String"].apply(lambda string: [el for el in the_list if el in string])

df
# gives
#     ID                    String      Desired_Column
# 0  100           Jonel-al89 (et)     [one, et, al89]
# 1  200  Stel-00(et) al89 x 57-mm          [et, al89]
# 2  300            Metall,   54ro  [et, Metall, 54ro]
# 3  400        allu, Metall9(lop)  [et, allu, Metall]

CodePudding user response:

You can use str.extractall with a crafted regex, then groupby.agg with ', '.join:

import re
pattern = '|'.join(map(re.escape, the_list))
# 'one|et|allu|Metall|54ro|al89'

df['Desired_Column'] = (df['String'].str.extractall(f'({pattern})')[0]
                        .groupby(level=0).agg(', '.join)
                       )

Output:

    ID                    String Desired_Column
0  100           Jonel-al89 (et)  one, al89, et
1  200  Stel-00(et) al89 x 57-mm       et, al89
2  300            Metall,   54ro   Metall, 54ro
3  400        allu, Metall9(lop)   allu, Metall
  • Related