I have the following code:
output = requests.get(url=url, auth=oauth, headers=headers, data=payload)
output_data = output.content
type(output_date)
<class 'bytes'>
output_data
Squeezed Text (3632 Lines)
When looking at the squeezed text, I have some values that look like this:
Steve likes to walk his dog. Steve says to John "I like \n Pineapple, oranges, \n and pizza.\n" and then he went to bed \n.
John likes his beer cold.\n
Sally likes her teeth brushed with a bottle of jack.\n
How can I remove the \n characters, but ONLY if it is contained within double quotes, so that my results look like this:
Steve likes to walk his dog. Steve says to John "I like Pineapple, oranges, and pizza." and then he went to bed \n.
John likes his beer cold.\n
Sally likes her teeth brushed with a bottle of jack.\n
I know how to remove \n
characters, but I am not sure how to do this if I only want to remove the values if they are contained within double quotes.
Here is what I have tries:
I found this, and used this code:
my_text = re.sub(r'"\\n"','',my_text)
But it doesn't seem to be working.
CodePudding user response:
I might be complicating it a bit, but something like this might work
parts = content.split("\"")
for i, part in enumerate(parts):
if i % 2:
parts[i] = part.replace("\n", "")
content = "\"".join(parts)
CodePudding user response:
Figured it out.
Steps:
- Convert bytes to String
- Create the pattern for Regex
- Use regex to format the values.
Step 1:
my_text = my_text.decode("utf-8")
Step 2:
pattern = re.compile(r'".*?"',re.DOTALL)
Step 3:
my_text = pattern.sub(lambda x:x.group().replace('\n',''),my_text)
This solves my problem.