I want to search for names in column col_one where I have a list of names in the variable list20. When searching, if the value of col_one matches in list20, put the same name in a new column named new_col. How to do ?
my data
import pandas as pd
list20 = ['CFER', 'LOU', 'ABC', 'BTT', 'JKK', 'TUUU']
data = {
"col_one": ["CFER", "ABCP6P45C9", "LOU-W5", "CFER-R", "ABC-W1", "LOU13C2465"],
}
df = pd.DataFrame(data)
df
Expected Output
CodePudding user response:
Try with str.extract
df['new'] = df['col_one'].str.extract('(' '|'.join(list20) ')')[0]
df
Out[121]:
col_one new
0 CFER CFER
1 ABCP6P45C9 ABC
2 LOU-W5 LOU
3 CFER-R CFER
4 ABC-W1 ABC
5 LOU13C2465 LOU
CodePudding user response:
One way to do this, less attractive in terms of efficiency, is to use a simple function with a lambda such that:
def matcher(col_one):
for i in list20:
if i in col_one:
return i
return 'na' #adjust as you see fit
df['new_col'] = df.apply(lambda x: matcher(x['col_one']), axis=1)
df
expected results:
col_one new_col
0 CFER CFER
1 ABCP6P45C9 ABC
2 LOU-W5 LOU
3 CFER-R CFER
4 ABC-W1 ABC
5 LOU13C2465 LOU
CodePudding user response:
Another approach:
pattern = re.compile(r"|".join(x for x in list20))
(df
.assign(new=lambda x: [re.findall(pattern, string)[0] for string in x.col_one])
)