I am new to C# and OpenXml. I need help with reading a .docx file and storing each paragraph in the Array.
I am Using OpenXml to read a word(.docx) file. I was able to read the file and print it. But the problem is I was only able to print the concatenated paragraph. I couldn't find a way to store each paragraph as array of Strings(Like in Python using docx library you automatically store paragraph as a list of string, I was looking something similar to that).
using System;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
OpenWordprocessingDocumentReadonly(@"E:\WordDocTest\Test.docx");
}
public static void OpenWordprocessingDocumentReadonly(string filepath)
{
// Open a WordprocessingDocument based on a filepath.
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
Console.WriteLine(body.InnerText);
wordDocument.Close();
}
}
}
}
Test.docx Looks Like this
1. Test
This is Test 1.
Test1 part a.
2. noTest
This is Test2.
The Output that I got was : TestThis is Test 1.Test1 part a.noTestThis is Test 2.
What I want to learn is about the way to store each paragraph or line in an Array of String and be able to iterate through that array.
CodePudding user response:
You can avoid using arrays and instead unleash the wonderful power of Openxml combined with Linq and Lists. If you want to work with paragraphs you could create a list lik this:
var paras = body.OfType<Paragraph>();
You can then expand on this to return specific elements using Where, for example:
var paras = body.OfType<Paragraph>()
.Where(p => p.ParagraphProperties != null &&
p.ParagraphProperties.ParagraphStyleId != null &&
p.ParagraphProperties.ParagraphStyleId.Val.Value.Contains("Heading1")).ToList();