Home > Back-end >  How to check if any word in a string has special characters and conditions in Pandas
How to check if any word in a string has special characters and conditions in Pandas

Time:02-12

I have a dataframe, where one column contains a tweet. I want to get the rows of this dataframe, where this "tweet" column contains any words that start with "#" and have 2 or more capital letters.

So for example, I want to retreive such rows:

  • I love coding in python. #CodingSession
  • I am not scared of #COVID19 anymore.

However, these would not classify under my conditions:

  • I love coding in python. #Coding #Session
  • I love coding in python. #Codingsession
  • I am not scared of #Covid19 anymore.

CodePudding user response:

Try str.contains:

df['Match'] = df['tweet'].str.contains(r'#[A-Z][^A-Z#]*[A-Z]')
print(df)

# Output
                                       tweet  Match
0    I love coding in python. #CodingSession   True
1        I am not scared of #COVID19 anymore   True
2  I love coding in python. #Coding #Session  False
3    I love coding in python. #Codingsession  False
4       I am not scared of #Covid19 anymore.  False
  • [A-Z] for a capital letter
  • [^A-Z#]* for anything else except capital letter or #
  • [A-Z] and again a capital letter

Regex101

  • Related