I have the following list and a DataFrame:
the_list = ["one", "et", "allu", "Metall", "54ro", 'al89']
df = pd.DataFrame({ 'ID':[100, 200, 300, 400],
'String':['Jonel-al89 (et)', 'Stel-00(et) al89 x 57-mm', 'Metall, 54ro', "allu, Metall9(lop)"]
})
What I need is to make a new column where I would get all the elements from the list that are present in each string in the "String" column. So the output should be looking like that:
ID | String | Desired_Column |
---|---|---|
100 | Jonel-al89 (et) | one, al89, et |
200 | Stel-00(et) al89 x 57-mm | et, al89 |
300 | Metall, 54ro | et, Metall, 54ro |
400 | allu, Metall9(lop) | allu, et, Metall |
What would be the way to achieve it?
Any help would be much appreciated!
CodePudding user response:
You don't even need regex if you use a list comprehension which checks for the presence of the elements from your list in the String column.
I'm not sure you want the elements as a list or as string, if you want a string put a str.join
around the comprehension.
import pandas as pd
the_list = ["one", "et", "allu", "Metall", "54ro", 'al89']
df = pd.DataFrame({ 'ID':[100, 200, 300, 400],
'String':['Jonel-al89 (et)', 'Stel-00(et) al89 x 57-mm', 'Metall, 54ro', "allu, Metall9(lop)"]
})
df["Desired_Column"] = df["String"].apply(lambda string: [el for el in the_list if el in string])
df
# gives
# ID String Desired_Column
# 0 100 Jonel-al89 (et) [one, et, al89]
# 1 200 Stel-00(et) al89 x 57-mm [et, al89]
# 2 300 Metall, 54ro [et, Metall, 54ro]
# 3 400 allu, Metall9(lop) [et, allu, Metall]
CodePudding user response:
You can use str.extractall
with a crafted regex, then groupby.agg
with ', '.join
:
import re
pattern = '|'.join(map(re.escape, the_list))
# 'one|et|allu|Metall|54ro|al89'
df['Desired_Column'] = (df['String'].str.extractall(f'({pattern})')[0]
.groupby(level=0).agg(', '.join)
)
Output:
ID String Desired_Column
0 100 Jonel-al89 (et) one, al89, et
1 200 Stel-00(et) al89 x 57-mm et, al89
2 300 Metall, 54ro Metall, 54ro
3 400 allu, Metall9(lop) allu, Metall