I have a pandas data frame with really long text in a column. I wanted to select all columns that contain ABC. I was able to do this using the following
df[df['Column'].str.contains('ABC', na=False)]
What I want to do after that is extract all values from this field that contain the prefix and the next 5 letters. S.So after finding a column, I would want to get ABC1234 or ABC7899.
I hope this makes sense.
CodePudding user response:
You can use str.extract
with a regular expression that says to capture any time it sees ABC with 5 following digits
df = pd.DataFrame({'Column':['ABC12345 is in this column', 'Not in this one CCD11111','Also in this one ABC99882']})
df['capture'] = df.Column.str.extract('(ABC\d{5})')
df.dropna(inplace=True)
print(df)
Output
Column capture
0 ABC12345 is in this column ABC12345
2 Also in this one ABC99882 ABC99882