Home > OS >  Python regex for one word
Python regex for one word

Time:11-09

I have this dataframe where I'm trying to delete all one word responses, with/without punctuation and could have spaces in front too. Most of the values are full, long sentences but please find below the kind I am trying to remove.

column
thanks
hello!
really....

My try textonly = re.sub('^.\w \w .$' , " " , df.column)

error (even though dtype is string) : expected string or bytes-like object

Another try which seems to go through but doesnt change anything :/

textonly = re.sub('^.\w \w .$' , " " , str(df.column))

Please help identify what I'm missing

CodePudding user response:

You can use

df['column'] = df['column'].str.replace(r'^\W*\w \W*$', '', regex=True)

If you mean natural language words by "words", i.e. only consisting of letters, you may use

df['column'] = df['column'].str.replace(r'^[\W\d_]*[^\W\d_] [\W\d_]*$', '', regex=True)

The regex matches

  • ^ - start of string
  • \W* - zero or more non-word chars
  • [\W\d_]* - zero or more non-word chars, digits and _
  • \w - one or more word chars
  • [^\W\d_] - one or more chars other than non-word chars, digits and _
  • \W* - zero or more non-word chars
  • $ - end of string.

CodePudding user response:

You could also not use regex and then check if the string has a space in it

x = [
    'hej med dig',
    'hej',
]

print([x for x in x if ' ' in x.strip()])
  • Related