I have a pandas dataframe and want to filter down to all the rows that contain a certain criteria in the “Title” column. The rows I want to filter down to are all rows that contain the format “(Axx)” (Where xx are 2 numbers). The data in the “Title” column doesn’t just consist of “(Axx)” data. The data in the “Title” column looks like so:
“some_string (Axx)”
What Ive been playing around a bit with different methods but cant seem to get it. I think the closest ive gotten is:
df.filter(regex=r'(D\d{2})', axis=0))
but its not correct as the entries aren’t being filtered.
CodePudding user response:
Use Series.str.contains
with escape ()
and $
for end of string and filter in boolean indexing
:
df = pd.DataFrame({'Title':['(D89)','aaa (D71)','(D5)','(D78) aa','D72']})
print (df)
Title
0 (D89)
1 aaa (D71)
2 (D5)
3 (D78) aa
df1 = df[df['Title'].str.contains(r'\(D\d{2}\)$')]
print (df1)
4 D72
Title
0 (D89)
1 aaa (D71)
If ned match only (Dxx)
use Series.str.match
:
df2 = df[df['Title'].str.match(r'\(D\d{2}\)$')]
print (df2)
Title
0 (D89)