Home > OS >  Using Regex to combine two lines
Using Regex to combine two lines

Time:11-21

I would like to use regex to combine two lines. If the first line has only one word and is followed by one \n , then combine it with next line. The first line sometimes may have a word and a comma , or a word with hyphen -

My text looks like this:

import re

text = '''
Critical
Accounting Policies and Estimates 
Review,
Approval or Ratification of Transactions with Related Persons 

Audit-Related
Fees are fees for assurance and related services by the principal accountant that are traditionally performed by the principal accountant and which are reasonably related to the performance of the audit or review of the registrant s financial statements and fees attributed to the audit of Guskin Gold Corporation, our wholly owned subsidiary. 

Effective risk oversight is an important priority of the Board of Directors. Because risks are considered in virtually every business decision, the Board of Directors discusses risk throughout the year generally or in connection with specific proposed actions. The Board of Directors approach to risk oversight includes understanding the critical risks in the Company s business and strategy, evaluating the Company s risk management processes, allocating responsibilities for risk oversight among the full Board of Directors, and fostering an appropriate culture of integrity and compliance with legal responsibilities. 

Corporate
Governance 

The
Company promotes accountability for adherence to honest and ethical conduct; endeavors to provide full, fair, accurate, timely and understandable disclosure in reports and documents that the Company files with the SEC and in other public communications made by the Company; and strives to be compliant with applicable governmental laws, rules and regulations. The Company has not formally adopted a written code of business conduct and ethics that governs the Company s employees, officers and Directors as the Company is not required to do so. 
'''

combine = re.sub(r'((?=[A-Za-z,-])\n(?=[a-zA-Z]))', ' ', text) 
print(combine)

I tried to use following code to combine them, but it didn't work.

combine = re.sub(r'((?=[A-Za-z,-])\n(?=[a-zA-Z]))', ' ', text) 

I hope it looks like this finally:

text = '''
Critical Accounting Policies and Estimates 
Review, Approval or Ratification of Transactions with Related Persons 

Audit-Related Fees are fees for assurance and related services by the principal accountant that are traditionally performed by the principal accountant and which are reasonably related to the performance of the audit or review of the registrant s financial statements and fees attributed to the audit of Guskin Gold Corporation, our wholly owned subsidiary. 

Effective risk oversight is an important priority of the Board of Directors. Because risks are considered in virtually every business decision, the Board of Directors discusses risk throughout the year generally or in connection with specific proposed actions. The Board of Directors approach to risk oversight includes understanding the critical risks in the Company s business and strategy, evaluating the Company s risk management processes, allocating responsibilities for risk oversight among the full Board of Directors, and fostering an appropriate culture of integrity and compliance with legal responsibilities. 

Corporate Governance 

The Company promotes accountability for adherence to honest and ethical conduct; endeavors to provide full, fair, accurate, timely and understandable disclosure in reports and documents that the Company files with the SEC and in other public communications made by the Company; and strives to be compliant with applicable governmental laws, rules and regulations. The Company has not formally adopted a written code of business conduct and ethics that governs the Company s employees, officers and Directors as the Company is not required to do so. 
'''

How could I write the code to combine them? Thanks!

CodePudding user response:

Thanks for Wiktor's comment! The code should be

combine = re.sub(r'((?<=[A-Za-z,-])\n(?=[a-zA-Z]))', ' ', text) 
  • Related