I have this dataframe where I'm trying to delete all one word responses, with/without punctuation and could have spaces in front too. Most of the values are full, long sentences but please find below the kind I am trying to remove.
column |
---|
thanks |
hello! |
really.... |
My try
textonly = re.sub('^.\w \w .$' , " " , df.column)
error (even though dtype is string) : expected string or bytes-like object
Another try which seems to go through but doesnt change anything :/
textonly = re.sub('^.\w \w .$' , " " , str(df.column))
Please help identify what I'm missing
CodePudding user response:
You can use
df['column'] = df['column'].str.replace(r'^\W*\w \W*$', '', regex=True)
If you mean natural language words by "words", i.e. only consisting of letters, you may use
df['column'] = df['column'].str.replace(r'^[\W\d_]*[^\W\d_] [\W\d_]*$', '', regex=True)
The regex matches
^
- start of string\W*
- zero or more non-word chars[\W\d_]*
- zero or more non-word chars, digits and_
\w
- one or more word chars[^\W\d_]
- one or more chars other than non-word chars, digits and_
\W*
- zero or more non-word chars$
- end of string.
CodePudding user response:
You could also not
use regex and then check if the string has a space in it
x = [
'hej med dig',
'hej',
]
print([x for x in x if ' ' in x.strip()])