Home > Net >  How to conserve dataframe rows containing a list a specific strings?
How to conserve dataframe rows containing a list a specific strings?

Time:10-13

I have a dataframe with a column level

                     level 

0                     HH
1                     FF
2                     FF
3       C,NN-FRAC,W-PROC
4                    C,D
              ...       
8433            C,W-PROC
8434                 C,D
8435                   D
8436                 C,Q
8437                C,HH

I would like to only conserve row which contains specific string:

searchfor = ['W','W-OFFSH','W-ONSH','W-GB','W-PROC','W-NGTC','W-TRANS','W-UNSTG','W-LNGSTG','W-LNGIE','W-LDC','X','Y','LL','MM','MM – REF','MM – IMP','MM – EXP','NN','NN-FRAC','NN-LDC','OO'] 

which should give me (from the above extract):

                     level 
1       C,NN-FRAC,W-PROC
2       C,W-PROC

I tried to apply these 2 different string filter but non one give me the excepted result.

df = df[df['industrytype'].str.contains(searchfor)]

df = df[df['industrytype'].str.contains(','.join(searchfor))]

CodePudding user response:

It might not be behaving the expected way because of the presence of comma in the columns. You can write a simple function which splits at comma and checks for each different splits. You can use apply method to use that function on the column.

def filter(x):
  x = x.split(',')
  for i in x:
    if i in searchfor:
      return True
  return False

df = df[df.industrytype.apply(filter)]
  • Related