I have a dataframe, where one column contains a tweet. I want to get the rows of this dataframe, where this "tweet" column contains any words that start with "#" and have 2 or more capital letters.
So for example, I want to retreive such rows:
- I love coding in python. #CodingSession
- I am not scared of #COVID19 anymore.
However, these would not classify under my conditions:
- I love coding in python. #Coding #Session
- I love coding in python. #Codingsession
- I am not scared of #Covid19 anymore.
CodePudding user response:
Try str.contains
:
df['Match'] = df['tweet'].str.contains(r'#[A-Z][^A-Z#]*[A-Z]')
print(df)
# Output
tweet Match
0 I love coding in python. #CodingSession True
1 I am not scared of #COVID19 anymore True
2 I love coding in python. #Coding #Session False
3 I love coding in python. #Codingsession False
4 I am not scared of #Covid19 anymore. False
[A-Z]
for a capital letter[^A-Z#]*
for anything else except capital letter or #[A-Z]
and again a capital letter