I have a list of string like
name_list=\['LIONEL MESSI','CRISTIANO RONALD','KYLIAN MBAPPÉ'\]
In the csv file, there is a column called 'long_name' have a lot of playernames, the value is like LIONEL ANDRÉS MESSI CUCCITTINI, CRISTIANO RONALDO DOS SANTOS AVEIRO, KYLIAN MBAPPÉ LOTTIN and NEYMAR DA SILVA SANTOS JÚNIOR the csv
I want to filter the column if the column contains the strings in list i.e. keep the name from the list and filter out the names do not in the list, but those strings are not perfectly match with the column, just partially match.
How can I use this list to filter the column? I have tried the below lines but doesn't work...
df['long_name'].str.contains('|'.join(name_list),regex=True)
I also tried this code but doesn't filter too
df[pd.notna(df['long_name']) & df['long_name'].astype(str).str.contains('|'.join(squad_list))]
CodePudding user response:
So just create a boolean mask by doing something similar to this:
'LIONEL' in 'LIONEL MESSI'
To get the columns of df type df.columns.
CodePudding user response:
Use contains
. It will be works
import pandas as pd
import re
name_list=['LIONEL MESSI','CRISTIANO RONALD','KYLIAN MBAPPÉ']
s = pd.Series(name_list)
s.str.contains('messi|CRISTIANO', regex=True, flags=re.IGNORECASE)
Out:
0 True
1 True
2 False
dtype: bool
s.str.contains('|'.join(name_list), regex=True, flags=re.IGNORECASE)
Out:
0 True
1 True
2 True
dtype: bool