Apose.Words ImportNode ignores font formatting when appendingchild-CodePudding

I am currently using Aspose.Words to open a document, pull content between a bookmark start and a bookmark end and then place that content into another document. The issue that I'm having is that when using the ImportNode method is imports onto my document but changes all of the fonts from Calibri to Times New Roman and changes the font size from whatever it was on the original document to 12pt.

The way I'm obtaining the content from the bookmark is by using the Aspose ExtractContent method.

Because I'm having the issue with the ImportNode stripping my font formatting I tried making some adjustments and saving each node to an HTML string using ToString(HtmlSaveOptions). This works mostly but the problem with this is it is stripping out my returns on the word document so none of my text has the appropriate spacing. My returns end up coming in as HTML in the following format

"<p style="margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt"><span style="font-family:Calibri; display:none; -aw-import:ignore">

When using DocumentBuilder.InsertHtml("<p style="margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt"><span style="font-family:Calibri; display:none; -aw-import:ignore">

"); it does not correctly add the return on the word document.

Here is the code I'm using, please forgive the comments etc... this has been my attempts at correcting this.

public async Task<string> GenerateHtmlString(Document srcDoc, ArrayList nodes)
    {
        // Create a blank document.
        Document dstDoc = new Document();
        ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Open"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
        // Remove the first paragraph from the empty document.
        dstDoc.FirstSection.Body.RemoveAllChildren();

        // Create a new Builder for the temporary document that gets generated with the header or footer data.
        // This allows us to control font and styles separately from the main document being built.
        var newBuilder = new DocumentBuilder(dstDoc);
        Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
        htmlSaveOptions.ExportImagesAsBase64 = true;
        htmlSaveOptions.SaveFormat = SaveFormat.Html;
        htmlSaveOptions.ExportFontsAsBase64 = true;
        htmlSaveOptions.ExportFontResources = true;
        htmlSaveOptions.ExportTextBoxAsSvg = true;
        htmlSaveOptions.ExportRoundtripInformation = true;
        htmlSaveOptions.Encoding = Encoding.UTF8;

        // Obtain all the links from the source document
        // This is used later to add hyperlinks to the html
        // because by default extracting nodes using Aspose
        // does not pull in the links in a usable way.
        var srcDocLinks = srcDoc.Range.Fields.GroupBy(x => x.DisplayResult).Select(x => x.First()).Where(x => x.Type == Aspose.Words.Fields.FieldType.FieldHyperlink).Distinct().ToList();
        var childNodes = nodes.Cast<Node>().Select(x => x).ToList();

        var oldBuilder = new DocumentBuilder(srcDoc);
        oldBuilder.MoveToBookmark("Header");
        var allchildren = oldBuilder.CurrentParagraph.Runs;

        var allChildNodes = childNodes[0].Document.GetChildNodes(NodeType.Any, true);
        var headerText = allChildNodes[0].Range.Bookmarks["Header"].BookmarkStart.GetText();

        foreach (Node node in nodes)
        {
            var html = node.ToString(htmlSaveOptions);

            try
            {
                // &#xa0; is used by aspose because it works in XML
                // If we see this character and the text of the node is \r we need to insert a break
                if (html.Contains("&#xa0;") && node.Range.Text == "\r")
                {
                    newBuilder.InsertHtml(html, false);
                    // Change the node into an HTML string
                    /*var htmlString = node.ToString(SaveFormat.Html);
                    var tempHtmlLinkDoc = new HtmlDocument();
                    tempHtmlLinkDoc.LoadHtml(htmlString);
                    // Get all the child nodes of the html document
                    var allChildNodes = tempHtmlLinkDoc.DocumentNode.SelectNodes("//*");

                    // Loop over all child nodes so we can make sure we apply the correct font family and size to the break.
                    foreach (var childNode in allChildNodes)
                    {
                        // Get the style attribute from the child node
                        var childNodeStyles = childNode.GetAttributeValue("style", "").Split(';');

                        foreach (var childNodeStyle in childNodeStyles)
                        {
                            // Apply the font name and size to the new builder on the document.
                            if (childNodeStyle.ToLower().Contains("font-family"))
                            {
                                newBuilder.Font.Name = childNodeStyle.Split(':')[1].Trim();
                            }
                            if (childNodeStyle.ToLower().Contains("font-size"))
                            {
                                newBuilder.Font.Size = Convert.ToDouble(childNodeStyle.Split(':')[1]
                                    .Replace("pt", "")
                                    .Replace("px", "")
                                    .Replace("em", "")
                                    .Replace("rem", "")
                                    .Replace("%", "")
                                    .Trim());
                            }
                        }
                    }

                    // Insert the break with the corresponding font size and name.
                    newBuilder.InsertBreak(BreakType.ParagraphBreak);*/
                }
                else
                {
                    // Loop through the source document links so the link can be applied to the HTML.
                    foreach (var srcDocLink in srcDocLinks)
                    {
                        if (html.Contains(srcDocLink.DisplayResult))
                        {
                            // Now that we know the html string has one of the links in it we need to get the address from the node.
                            var linkAddress = srcDocLink.Start.NextSibling.GetText().Replace(" HYPERLINK \"", "").Replace("\"", "");

                            //Convert the node into an HTML String so we can get the correct font color, name, size, and any text decoration.
                            var htmlString = srcDocLink.Start.NextSibling.ToString(SaveFormat.Html);
                            var tempHtmlLinkDoc = new HtmlDocument();
                            tempHtmlLinkDoc.LoadHtml(htmlString);
                            var linkStyles = tempHtmlLinkDoc.DocumentNode.ChildNodes[0].GetAttributeValue("style", "").Split(';');
                            var linkStyleHtml = "";

                            foreach (var linkStyle in linkStyles)
                            {
                                if (linkStyle.ToLower().Contains("color"))
                                {
                                    linkStyleHtml  = $"color:{linkStyle.Split(':')[1].Trim()};";
                                }
                                if (linkStyle.ToLower().Contains("font-family"))
                                {
                                    linkStyleHtml  = $"font-family:{linkStyle.Split(':')[1].Trim()};";
                                }
                                if (linkStyle.ToLower().Contains("font-size"))
                                {
                                    linkStyleHtml  = $"font-size:{linkStyle.Split(':')[1].Trim()};";
                                }
                                if (linkStyle.ToLower().Contains("text-decoration"))
                                {
                                    linkStyleHtml  = $"text-decoration:{linkStyle.Split(':')[1].Trim()};";
                                }
                            }


                            if (linkAddress.ToLower().Contains("mailto:"))
                            {
                                // Since the link has mailto included don't add the target attribute to the link.
                                html = new Regex($@"\b{srcDocLink.DisplayResult}\b").Replace(html, $"<a href=\"{linkAddress}\" style=\"{linkStyleHtml}\">{srcDocLink.DisplayResult}</a>");
                                //html = html.Replace(srcDocLink.DisplayResult, $"<a href=\"{linkAddress}\" style=\"{linkStyleHtml}\">{srcDocLink.DisplayResult}</a>");
                            }
                            else
                            {
                                // Since the links is not an email include the target attribute.
                                html = new Regex($@"\b{srcDocLink.DisplayResult}\b").Replace(html, $"<a href=\"{linkAddress}\" style=\"{linkStyleHtml}\" target=\"_blank\">{srcDocLink.DisplayResult}</a>");
                                //html = html.Replace(srcDocLink.DisplayResult, $"<a href=\"{linkAddress}\" style=\"{linkStyleHtml}\" target=\"_blank\">{srcDocLink.DisplayResult}</a>");
                            }
                        }
                    }

                    // Inseret the HTML String into the temporary document.
                    newBuilder.InsertHtml(html, false);
                }
            }
            catch (Exception ex)
            {
                throw;
            }
        }

        // This is just for debugging/troubleshooting purposes and to make sure thigns look correct
        string tempDocxPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.docx");
        dstDoc.Save(tempDocxPath);

        // We generate this HTML file then load it back up and pass the DocumentNode.OuterHtml back to the requesting method.
        ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Save"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
        string tempHtmlPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.html");
        dstDoc.Save(tempHtmlPath, htmlSaveOptions);

        var tempHtmlDoc = new HtmlDocument();
        tempHtmlDoc.Load(tempHtmlPath);
        var htmlText = tempHtmlDoc.DocumentNode.OuterHtml;

        // Clean up our mess...
        if (File.Exists(tempDocxPath))
        {
            File.Delete(tempDocxPath);
        }

        if (File.Exists(tempHtmlPath))
        {
            File.Delete(tempHtmlPath);
        }

        // Return the generated HTML string.
        return htmlText;
    }

CodePudding user response：

Saving each node to HTML and then inserting them into the destination document is not a good idea. Because not all nodes can be properly saved to HTML and some formatting can be lost after Aspose.Words DOM -> HTML -> Aspose.Words DOM roundtrip.

Regarding the original issue, the problem might occur because you are using ImportFormatMode.UseDestinationStyles, in this case styles and default of the destination document are used and font might be changed. If you need to keep the source document formatting, you should use ImportFormatMode.KeepSourceFormatting.

If the problem occurs even with ImportFormatMode.KeepSourceFormatting this must be a bug and you should report this to Aspose.Words staff in the support forum.