Home > Mobile >  Delete rows from pandas dataframe by using regex
Delete rows from pandas dataframe by using regex

Time:01-19

This is my dataframe:

df = pd.DataFrame(
    {
        'a': [
            '#x{LA 0.098:abc}',
            '#x{LA abc:0.31}',
            '#x{BC abc:0.1231}',
            '#x{LA 0.333:abc}',
            '#x{CN 0.031:abc}',
            '#x{YM abc:12345}',
            '#x{YM 1222:abc}',
        ]
    }
)

I have two list of ids that are needed in order to delete rows based on the postion of "abc" from the colon. That is whether abc is on the right side of colon or left side. These are my lists:

labels_that_abc_is_right = ['LA', 'CN']
labels_that_abc_is_left = ['YM', 'BC']

For example I want to omit rows that contain LA and abc is on the right side of colon. The same applies for CN. I want to delete rows that contain YM and abc is on the left side of colon. This is just a sample. I have hundreds of Ids. This is the output that I want after deleting rows:

                 a
1    #x{LA abc:0.31}
6    #x{YM 1222:abc}

I have tried the solutions of these two answers: answer1 and answer2. And I know that I probably need to use df.a.str.contains with a regex. But it still doesn't work

CodePudding user response:

Form your criteria into regex first, then do the data filtering:

regex_right = r'\b(LA|CN)\b. \b:abc\b'
regex_left = r'\b(YM|BC)\b. \babc:\b'
df[~(df['a'].str.contains(regex_right, regex=True) | df['a'].str.contains(regex_left, regex=True))]
  • Related