Home > Back-end >  Search columns with list of string for a specific set of text and if the text is found enter new a n
Search columns with list of string for a specific set of text and if the text is found enter new a n

Time:10-09

I want to search for names in column col_one where I have a list of names in the variable list20. When searching, if the value of col_one matches in list20, put the same name in a new column named new_col. How to do ?

my data

import pandas as pd

list20 = ['CFER', 'LOU', 'ABC', 'BTT', 'JKK', 'TUUU']

data = {
  "col_one": ["CFER", "ABCP6P45C9", "LOU-W5", "CFER-R", "ABC-W1", "LOU13C2465"],
}

df = pd.DataFrame(data)
df

Expected Output

enter image description here

CodePudding user response:

Try with str.extract

df['new'] = df['col_one'].str.extract('(' '|'.join(list20) ')')[0]
df
Out[121]: 
      col_one   new
0        CFER  CFER
1  ABCP6P45C9   ABC
2      LOU-W5   LOU
3      CFER-R  CFER
4      ABC-W1   ABC
5  LOU13C2465   LOU

CodePudding user response:

One way to do this, less attractive in terms of efficiency, is to use a simple function with a lambda such that:

def matcher(col_one):
    for i in list20:
        if i in col_one:
            return i
    return 'na' #adjust as you see fit

df['new_col'] = df.apply(lambda x: matcher(x['col_one']), axis=1)
df

expected results:

    col_one     new_col
0   CFER        CFER
1   ABCP6P45C9  ABC
2   LOU-W5      LOU
3   CFER-R      CFER
4   ABC-W1      ABC
5   LOU13C2465  LOU

CodePudding user response:

Another approach:

pattern = re.compile(r"|".join(x for x in list20))

(df
 .assign(new=lambda x: [re.findall(pattern, string)[0] for string in x.col_one])
)
  • Related