I would like to use regex
to combine two lines. If the first line has only one word and is followed by one \n
, then combine it with next line. The first line sometimes may have a word and a comma ,
or a word with hyphen -
My text looks like this:
import re
text = '''
Critical
Accounting Policies and Estimates
Review,
Approval or Ratification of Transactions with Related Persons
Audit-Related
Fees are fees for assurance and related services by the principal accountant that are traditionally performed by the principal accountant and which are reasonably related to the performance of the audit or review of the registrant s financial statements and fees attributed to the audit of Guskin Gold Corporation, our wholly owned subsidiary.
Effective risk oversight is an important priority of the Board of Directors. Because risks are considered in virtually every business decision, the Board of Directors discusses risk throughout the year generally or in connection with specific proposed actions. The Board of Directors approach to risk oversight includes understanding the critical risks in the Company s business and strategy, evaluating the Company s risk management processes, allocating responsibilities for risk oversight among the full Board of Directors, and fostering an appropriate culture of integrity and compliance with legal responsibilities.
Corporate
Governance
The
Company promotes accountability for adherence to honest and ethical conduct; endeavors to provide full, fair, accurate, timely and understandable disclosure in reports and documents that the Company files with the SEC and in other public communications made by the Company; and strives to be compliant with applicable governmental laws, rules and regulations. The Company has not formally adopted a written code of business conduct and ethics that governs the Company s employees, officers and Directors as the Company is not required to do so.
'''
combine = re.sub(r'((?=[A-Za-z,-])\n(?=[a-zA-Z]))', ' ', text)
print(combine)
I tried to use following code to combine them, but it didn't work.
combine = re.sub(r'((?=[A-Za-z,-])\n(?=[a-zA-Z]))', ' ', text)
I hope it looks like this finally:
text = '''
Critical Accounting Policies and Estimates
Review, Approval or Ratification of Transactions with Related Persons
Audit-Related Fees are fees for assurance and related services by the principal accountant that are traditionally performed by the principal accountant and which are reasonably related to the performance of the audit or review of the registrant s financial statements and fees attributed to the audit of Guskin Gold Corporation, our wholly owned subsidiary.
Effective risk oversight is an important priority of the Board of Directors. Because risks are considered in virtually every business decision, the Board of Directors discusses risk throughout the year generally or in connection with specific proposed actions. The Board of Directors approach to risk oversight includes understanding the critical risks in the Company s business and strategy, evaluating the Company s risk management processes, allocating responsibilities for risk oversight among the full Board of Directors, and fostering an appropriate culture of integrity and compliance with legal responsibilities.
Corporate Governance
The Company promotes accountability for adherence to honest and ethical conduct; endeavors to provide full, fair, accurate, timely and understandable disclosure in reports and documents that the Company files with the SEC and in other public communications made by the Company; and strives to be compliant with applicable governmental laws, rules and regulations. The Company has not formally adopted a written code of business conduct and ethics that governs the Company s employees, officers and Directors as the Company is not required to do so.
'''
How could I write the code to combine them? Thanks!
CodePudding user response:
Thanks for Wiktor's comment! The code should be
combine = re.sub(r'((?<=[A-Za-z,-])\n(?=[a-zA-Z]))', ' ', text)