I have a dataframe with a column level
level
0 HH
1 FF
2 FF
3 C,NN-FRAC,W-PROC
4 C,D
...
8433 C,W-PROC
8434 C,D
8435 D
8436 C,Q
8437 C,HH
I would like to only conserve row which contains specific string:
searchfor = ['W','W-OFFSH','W-ONSH','W-GB','W-PROC','W-NGTC','W-TRANS','W-UNSTG','W-LNGSTG','W-LNGIE','W-LDC','X','Y','LL','MM','MM – REF','MM – IMP','MM – EXP','NN','NN-FRAC','NN-LDC','OO']
which should give me (from the above extract):
level
1 C,NN-FRAC,W-PROC
2 C,W-PROC
I tried to apply these 2 different string filter but non one give me the excepted result.
df = df[df['industrytype'].str.contains(searchfor)]
df = df[df['industrytype'].str.contains(','.join(searchfor))]
CodePudding user response:
It might not be behaving the expected way because of the presence of comma in the columns. You can write a simple function which splits at comma and checks for each different splits. You can use apply method to use that function on the column.
def filter(x):
x = x.split(',')
for i in x:
if i in searchfor:
return True
return False
df = df[df.industrytype.apply(filter)]