This is my dataframe:
df = pd.DataFrame(
{
'a': [
'#x{LA 0.098:abc}',
'#x{LA abc:0.31}',
'#x{BC abc:0.1231}',
'#x{LA 0.333:abc}',
'#x{CN 0.031:abc}',
'#x{YM abc:12345}',
'#x{YM 1222:abc}',
]
}
)
I have two list of ids that are needed in order to delete rows based on the postion of "abc
" from the colon. That is whether abc
is on the right side of colon or left side.
These are my lists:
labels_that_abc_is_right = ['LA', 'CN']
labels_that_abc_is_left = ['YM', 'BC']
For example I want to omit rows that contain LA
and abc
is on the right side of colon. The same applies for CN
. I want to delete rows that contain YM
and abc
is on the left side of colon. This is just a sample. I have hundreds of Ids.
This is the output that I want after deleting rows:
a
1 #x{LA abc:0.31}
6 #x{YM 1222:abc}
I have tried the solutions of these two answers: answer1 and answer2. And I know that I probably need to use df.a.str.contains
with a regex. But it still doesn't work
CodePudding user response:
Form your criteria into regex first, then do the data filtering:
regex_right = r'\b(LA|CN)\b. \b:abc\b'
regex_left = r'\b(YM|BC)\b. \babc:\b'
df[~(df['a'].str.contains(regex_right, regex=True) | df['a'].str.contains(regex_left, regex=True))]