I am working on preprocessing some text in Python and would like to get rid of all text that appears in double quotes within the text. I am unsure how to do that and will appreciate your help with. A minimally reproducible example is below for your reference. Thank you in advance.
x='The frog said "All this needs to get removed" something'
So, pretty much what I want to get is 'The frog said something'
by removing the text in the double quotes from x
above, and I am not sure how to do that. Thanks once again.
CodePudding user response:
Use regex substitution:
import re
x='The frog said "All this needs to get removed" something'
res = re.sub(r'\s*"[^"] "\s*', ' ', x)
print(res)
The frog said something
CodePudding user response:
If you want to use index and slicing:
s='The frog said "All this needs to get removed" something'
# To get the index of both the quotes
[i for i, x in enumerate(s) if x == '"']
#[14, 44]
s[:13] s[45:]
#'The frog said something'