Python to Find-replace a string and Create Two Paragraphs Before String in Words Document-CodePudding

I have a VBA Macro. In that, I have

.Find Text = 'Pollution'
.Replacement Text = '^p^pChemical'

Here, '^p^pChemical' means Replace the Word Pollution with Chemical and create two empty paragraphs before the word sea.

Before:

After:

Have you noticed that The Word Pollution has been replaced With Chemical and two empty paragraphs preceds it ? This is how I want in Python.

My Code so far:

import docx
from docx import Document
    document = Document('Example.docx')
    for Paragraph in document.paragraphs:
        if 'Pollution' in paragraph:
             replace(Pollution, Chemical)
        document.add_paragraph(before('Chemical'))
        document.add_paragraph(before('Chemical'))

I want to open a word document to find the word, replace it with another word, and create two empty paragraphs before the replaced word.

CodePudding user response：

You can search through each paragraph to find the word of interest, and call insert_paragraph_before to add the new elements:

def replace(doc, target, replacement):
   for par in list(document.paragraphs):
        text = par.text
        while (index := text.find(target)) != -1:
            par.insert_paragraph_before(text[:index].rstrip())
            par.insert_paragraph_before('')
            par.text = replacement   text[index   len(target)]

list(doc.paragraphs) makes a copy of the list, so that the iteration is not thrown off when you insert elements.

Call this function as many times as you need to replace whatever words you have.

CodePudding user response：

This will take the text from the your document, replace the instances of the word pollution with chemical and add paragraphs in between, but it doesn't change the first document, instead it creates a copy. This is probably the safer route to go anyway.

import re
from docx import Document

ref = {"Pollution":"Chemicals", "Ocean":"Sea", "Speaker":"Magnet"}

def get_old_text():
    doc1 = Document('demo.docx')
    fullText = []
    for para in doc1.paragraphs:
        fullText.append(para.text)
    text = '\n'.join(fullText)
    return text


def create_new_document(ref, text):
    doc2 = Document()
    lines = text.split('\n')
    for line in lines:
        for k in ref:
            if k.lower() in line.lower():
                parts = re.split(f'{k}', line, flags=re.I)
                doc2.add_paragraph(parts[0])
                for part in parts[1:]:
                    doc2.add_paragraph('')
                    doc2.add_paragraph('')
                    doc2.add_paragraph(ref[k]   " "   part)
    doc2.save('demo.docx')


text = get_old_text()
create_new_document(ref, text)

CodePudding user response：

You need to use \n for new line. Using re should work like so:

import re

before = "The term Pollution means the manifestation of any unsolicited foregin substance in something. When we talk about pollution on earth, we refer to the contamination that is happening of the natural resources by various pollutants"
pattern = re.compile("pollution", re.IGNORECASE)
after = pattern.sub("\n\nChemical", before)
print(after)

Which will output:

The term 

Chemical means the manifestation of any unsolicited foregin substance in something. When we talk about 

Chemical on earth, we refer to the contamination that is happening of the natural resources by various pollutants