Home > Software engineering >  How to extract particular paragraph in text file using python?
How to extract particular paragraph in text file using python?

Time:09-28

I have to extract particular paragraphs starting from "SUBSTITUTION OF TRUSTEE" and ending with "under said Deed of Trust".

  1. since the field is repeated need to find data only within paragraphs.

  2. data might be like date, document num etc

sample.txt
Inst #: 2021
Fees: $42.00

06/24/2021 06:54:48 AM
Receipt #: 4587188

Requestor:
FINANCIAL CORPORATION OF
After recording return to: Src: MAIL

Mail Tax Statements to:

SUBSTITUTION OF TRUSTEE
AND DEED OF RECONVEYANCE

The undersigned, Financial Corporation of Nevada, a Nevada Corporation, as the Owner and
Holder of the Note secured by Deed of Trust dated March 1, 2013 made by Elvia Bello, Trustor, to
Official Records -- HEREBY substitutes Financial Corporation of Nevada, a Nevada Corporation,
as Trustee in lieu of the Trustee therein.


Said Note, together with all other indebtedness secured by said Deed of Trust, has been fully paid 
satisfied; and as successor Trustee, the undersigned does hereby RECONVEY WITHOUT
WARRANTY TO THE PERSON OR PERSONS LEGALLY ENTITLED THERETO, all the estate now
held by it under said Deed of Trust.
This JNO aay of June 2021,
Financial Corporation
wy luo Rtn rae
import re
mylines = []

pattern = re.compile(r"SUBSTITUTION OF TRUSTEE", re.IGNORECASE)
with open(r'sample.txt', 'rt', encoding='utf-8') as myfile:
    for line in myfile:                 
            mylines.append(line)
    for line in mylines:
        if(line == "SUBSTITUTION OF TRUSTEE "):
            print(line)
            break
        else:
            mylines.remove(line)
    
    print("my lines",mylines)

CodePudding user response:

Here is a naïve method to accomplish what you intend -

extracted_lines=[]
extract = False

for line in open("sample.txt"):

    if extract == False and "SUBSTITUTION OF TRUSTEE".lower() in line.strip().lower():
        extract = True
        
    if extract :
        extracted_lines.append(line)
        if "under said Deed of Trust".lower() in line.strip().lower():
            extract = False # or break
            
print("".join(extracted_lines))

CodePudding user response:

You can check each line for the substitution of trustee substring at its start first, and once found, set a flag variable to True. When the flag is true, keep adding lines to the mylines list. Then, once you get to the line containing under said deed or trust, stop adding lines and return the result:

mylines = []
flag = False
with open(r'sample.txt', 'rt', encoding='utf-8') as myfile:
    for line in myfile:
        if line.strip().upper().startswith("SUBSTITUTION OF TRUSTEE"):
            flag = not flag
        if flag:
            mylines.append(line)
            if "under said deed of trust" in line.strip().lower():
                break

print("".join(mylines))

See this Python demo.

Output:

SUBSTITUTION OF TRUSTEE
AND DEED OF RECONVEYANCE

The undersigned, Financial Corporation of Nevada, a Nevada Corporation, as the Owner and
Holder of the Note secured by Deed of Trust dated March 1, 2013 made by Elvia Bello, Trustor, to
Official Records -- HEREBY substitutes Financial Corporation of Nevada, a Nevada Corporation,
as Trustee in lieu of the Trustee therein.


Said Note, together with all other indebtedness secured by said Deed of Trust, has been fully paid 
satisfied; and as successor Trustee, the undersigned does hereby RECONVEY WITHOUT
WARRANTY TO THE PERSON OR PERSONS LEGALLY ENTITLED THERETO, all the estate now
held by it under said Deed of Trust.
  • Related