I have a list with names.
name= ["John Lewis", "Michael Armstrong", "Kurt Abela","Brian Watson", "Gregory Dubois"]
name = pd.DataFrame(name)
I have a pandas.DataFrame()
called df
:
df = pd.DataFrame(
{'Name':['Karan Singh,John Lewis', 'Michael Armstrong, Fabian Schreiber', 'Roy Dalhuisen', 'Arya Yildirim,Gregory Dubois'],
'ID':[23,22,21,24]})
Now I would like to filter df
, that only names which occured in name
to also occur in df
after filtering.
I tried this, but it didn't work:
df = df[~df.index.isin(name.index)
CodePudding user response:
You can use apply like this
import pandas as pd
main_name= ["John Lewis","Michael Armstrong","Kurt Abela","Brian Watson","Gregory Dubois"]
df={'Name':['Karan Singh,John Lewis','Michael Armstrong, Fabian Schreiber','Roy Dalhuisen','Arya Yildirim,Gregory Dubois',"hh,bb"],'ID':[23,22,21,24,28]}
#df to pandas
df = pd.DataFrame(df)
print(df)
def filter_names(row):
names = row.split(',')
return any(name in names for name in main_name)
df_filtered = df[df['Name'].apply(filter_names)]
print(df_filtered)
Result
Name ID
0 Karan Singh,John Lewis 23
1 Michael Armstrong, Fabian Schreiber 22
2 Roy Dalhuisen 21
3 Arya Yildirim,Gregory Dubois 24
4 hh,bb 28
Name ID
0 Karan Singh,John Lewis 23
1 Michael Armstrong, Fabian Schreiber 22
3 Arya Yildirim,Gregory Dubois 24
CodePudding user response:
You can use like this:filtered_df = df[df['Name'].isin(name)]
CodePudding user response:
Example
name= ["John Lewis","Michael Armstrong","Kurt Abela","Brian Watson","Gregory Dubois"]
data = {'Name':['Karan Singh,John Lewis','Michael Armstrong, Fabian Schreiber','Roy Dalhuisen','Arya Yildirim,Gregory Dubois'],'ID':[23,22,21,24]}
df = pd.DataFrame(data)
df
Name ID
0 Karan Singh,John Lewis 23
1 Michael Armstrong, Fabian Schreiber 22
2 Roy Dalhuisen 21
3 Arya Yildirim,Gregory Dubois 24
i don know exact what you want.
Code1
out = (df.assign(Name=df['Name'].str.split(','))
.explode('Name')[lambda x: x['Name'].isin(name)])
out
Name ID
0 John Lewis 23
1 Michael Armstrong 22
3 Gregory Dubois 24
Code2
out = df[df['Name'].str.contains('|'.join(name))]
out
Name ID
0 Karan Singh,John Lewis 23
1 Michael Armstrong, Fabian Schreiber 22
3 Arya Yildirim,Gregory Dubois 24
Update
name= ["John Lewis", "Michael Armstrong", "Kurt Abela","Brian Watson", "Gregory Dubois"]
name = pd.DataFrame(name)
df = pd.DataFrame(
{'Name':[['Karan Singh','John Lewis'], ['Michael Armstrong', 'Fabian Schreiber'], ['Roy Dalhuisen'], ['Arya Yildirim', 'Gregory Dubois']],
'ID':[23,22,21,24]})
name
0
0 John Lewis
1 Michael Armstrong
2 Kurt Abela
3 Brian Watson
4 Gregory Dubois
df
Name ID
0 [Karan Singh, John Lewis] 23
1 [Michael Armstrong, Fabian Schreiber] 22
2 [Roy Dalhuisen] 21
3 [Arya Yildirim, Gregory Dubois] 24
code
out = df[df['Name'].apply(lambda x: name[0].isin(x).sum() > 0)]
out
Name ID
0 [Karan Singh, John Lewis] 23
1 [Michael Armstrong, Fabian Schreiber] 22
3 [Arya Yildirim, Gregory Dubois] 24