Home > OS >  Using Regex to combine lines start with quotation marks
Using Regex to combine lines start with quotation marks

Time:11-21

I would like to combine two lines with only one line feed \n, and sometime the next line starts with a quotation mark. I am trying use this code to combine them, with \" to find quotation marks,

comb_nextline = re.sub(r'(?<=[^\.][A-Za-z,-])\n[ ]*(?=[a-zA-Z0-9\(\"])', ' ', txt)

but it doesn't work with the line start with a quotation mark. Is there any way to combine lines starts with quotation marks? Thanks!

My txt looks like this:

import re
 
txt= '''
The first process, called wafer bumping, involves a reflow solder process to form the solder balls on all of the input/output
(I/O) pads on the wafer. Because of the extremely small geometries involved, in some instances this process is best accomplished in a hydrogen atmosphere. RTC offers a high temperature furnace for this application, equipped with the hydrogen package, providing a re-flow process in a 100 hydrogen atmosphere. For a second process, called 
"chip joining", RTC offers both a near infrared or forced convection oven.
'''

comb_nextline = re.sub(r'(?<=[^\.][A-Za-z,-])\n[ ]*(?=[a-zA-Z0-9\(\"])', ' ', txt)
print(comb_nextline)

And I hope to get this

txt = 
'''
The first process, called wafer bumping, involves a reflow solder process to form the solder balls on all of the input/output (I/O) pads on the wafer. Because of the extremely small geometries involved, in some instances this process is best accomplished in a hydrogen atmosphere. RTC offers a high temperature furnace for this application, equipped with the hydrogen package, providing a re-flow process in a 100 hydrogen atmosphere. For a second process, called "chip joining", RTC offers both a near infrared or forced convection oven.
'''

CodePudding user response:

You can also match optional spaces before matching the newline

(?<=[^.][A-Za-z,-]) *\n *(?=[a-zA-Z0-9(\"])

Regex demo | Python demo

Or matching all spaces without newlines using a negated character class [^\S\n]

(?<=[^.][A-Za-z,-])[^\S\n]*\n[^\S\n]*(?=[a-zA-Z0-9(\"])

Regex demo

import re

txt = '''
The first process, called wafer bumping, involves a reflow solder process to form the solder balls on all of the input/output
(I/O) pads on the wafer. Because of the extremely small geometries involved, in some instances this process is best accomplished in a hydrogen atmosphere. RTC offers a high temperature furnace for this application, equipped with the hydrogen package, providing a re-flow process in a 100 hydrogen atmosphere. For a second process, called 
"chip joining", RTC offers both a near infrared or forced convection oven.
'''

comb_nextline = re.sub(r'(?<=[^.][A-Za-z,-]) *\n *(?=[a-zA-Z0-9(\"])', ' ', txt)
print(comb_nextline)

Output

The first process, called wafer bumping, involves a reflow solder process to form the solder balls on all of the input/output (I/O) pads on the wafer. Because of the extremely small geometries involved, in some instances this process is best accomplished in a hydrogen atmosphere. RTC offers a high temperature furnace for this application, equipped with the hydrogen package, providing a re-flow process in a 100 hydrogen atmosphere. For a second process, called "chip joining", RTC offers both a near infrared or forced convection oven.
  • Related