I have a docx file, I opened it in PyCharm using textract. The docx contains a text with multiple paragraphs. What I want to do is detect every paragraph break and put every paragraph in a separate variables or as a list as string to use for later?
How can I do that in Python 3?
Please help!
I haven't anything on the same.
CodePudding user response:
You can achieve that by using Document
from docx
from docx import Document
document = Document('path/to/your/file.docx')
paragraphs = [para.text for para in document.paragraphs]