Home > other >  filtering pandas dataframe when data contains two parts
filtering pandas dataframe when data contains two parts

Time:12-08

I have a pandas dataframe and want to filter down to all the rows that contain a certain criteria in the “Title” column. The rows I want to filter down to are all rows that contain the format “(Axx)” (Where xx are 2 numbers). The data in the “Title” column doesn’t just consist of “(Axx)” data. The data in the “Title” column looks like so:

“some_string (Axx)”

What Ive been playing around a bit with different methods but cant seem to get it. I think the closest ive gotten is:

df.filter(regex=r'(D\d{2})', axis=0))

but its not correct as the entries aren’t being filtered.

CodePudding user response:

Use Series.str.contains with escape () and $ for end of string and filter in boolean indexing:

df = pd.DataFrame({'Title':['(D89)','aaa (D71)','(D5)','(D78) aa','D72']})
print (df)
       Title
0      (D89)
1  aaa (D71)
2       (D5)
3   (D78) aa
    
df1 = df[df['Title'].str.contains(r'\(D\d{2}\)$')]
print (df1)
4        D72
       Title
0      (D89)
1  aaa (D71)

If ned match only (Dxx) use Series.str.match:

df2 = df[df['Title'].str.match(r'\(D\d{2}\)$')]
print (df2)
   Title
0  (D89)
  • Related