Home > Net >  How can I filter a column using a list of string that is partially match with the column?
How can I filter a column using a list of string that is partially match with the column?

Time:12-18

I have a list of string like

name_list=\['LIONEL MESSI','CRISTIANO RONALD','KYLIAN MBAPPÉ'\]

In the csv file, there is a column called 'long_name' have a lot of playernames, the value is like LIONEL ANDRÉS MESSI CUCCITTINI, CRISTIANO RONALDO DOS SANTOS AVEIRO, KYLIAN MBAPPÉ LOTTIN and NEYMAR DA SILVA SANTOS JÚNIOR the csv

I want to filter the column if the column contains the strings in list i.e. keep the name from the list and filter out the names do not in the list, but those strings are not perfectly match with the column, just partially match.

How can I use this list to filter the column? I have tried the below lines but doesn't work...

df['long_name'].str.contains('|'.join(name_list),regex=True)

I also tried this code but doesn't filter too df[pd.notna(df['long_name']) & df['long_name'].astype(str).str.contains('|'.join(squad_list))]

CodePudding user response:

So just create a boolean mask by doing something similar to this:

'LIONEL' in 'LIONEL MESSI'

To get the columns of df type df.columns.

CodePudding user response:

Use contains. It will be works

import pandas as pd
import re

name_list=['LIONEL MESSI','CRISTIANO RONALD','KYLIAN MBAPPÉ']
s = pd.Series(name_list)

s.str.contains('messi|CRISTIANO', regex=True, flags=re.IGNORECASE)
Out: 
     0     True
     1     True
     2    False
    dtype: bool
s.str.contains('|'.join(name_list), regex=True, flags=re.IGNORECASE)
Out:
    0    True
    1    True
    2    True
    dtype: bool
  • Related