Home > Net >  How to convert an Excel formatted text to HTML?
How to convert an Excel formatted text to HTML?

Time:07-06

I made a web platform where users can add a long description for a product. The Description field is a wysiwyg editor and the content is saved in HTML in my database (MySQL).

The users rewrote all the descriptions in an Excel file because they have a lot of products (1700 ) and they want to do a bulk import of these texts in the database : 1 cell = 1 description = 1 product.

They formatted the texts (bold, italic, underline, paragraphs...) and I must keep that layout when I will import these descriptions in the database. So, I have to convert these texts into HTML language.

Any idea ?

The only way I found is to :

  1. Copy-paste the cells in a Word file (to keep the layout).
  2. Copy-paste the Word content to an online HTML converter.
  3. Copy-paste the converted text to another Excel file for my bulk import.

It would be really time-consuming for the 1700 products (besides, we have multiple languages...).


EDIT :

[Imagine this text in a cell]

Vestibulum eget viverra nisi.

Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.

Suspendisse varius nisi quis metus semper dapibus. Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis

[I want this in a cell]

<p><strong>Vestibulum eget viverra nisi.</strong></p>

<p>Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.</p>

<p><em>Suspendisse varius nisi quis metus semper dapibus.</em> Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis</p>

Or maybe this (replace the line breaks with <br /> tags) :

<strong>Vestibulum eget viverra nisi.</strong><br />
<br />
Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue lectus accumsan risus, non dapibus orci leo a erat.<br />
<br />
<em>Suspendisse varius nisi quis metus semper dapibus.</em> Cras ullamcorper iaculis tortor eget rhoncus. Integer hendrerit vulputate felis

CodePudding user response:

https://excelribbon.tips.net/T013402_Finding_Positions_of_Formatted_Characters_in_a_Cell.html shows how one can find the fontstyle of each character in the string using VBA:
for i = 1 to <length of cell>
<....> = Range("A1").Characters(i, 1).Font.FontStyle
next i
With that capability, is the rest of a solution obvious to you?

CodePudding user response:

With a c# routine, I have been able to determine that when a formatted cell is "copied" in Excel, Excel places an HTML version of it in the Windows clipboard along with some other types like plain text and bitmap graphic.

This is the c# routine to retrieve the HTML from the Windows clipboard (with some help from "Current thread must be set to single thread apartment (STA)" error in copy string to clipboard )

using System.Threading;

namespace SO72854821HTMLfromClipboard
{
    public class HTMLfromClipboard
    {
        public static string getHTMLFromClipboard()
        {
            string htmlString = null;
            Thread STAThread = new Thread(
                delegate ()
                {
                    if (System.Windows.Clipboard.ContainsText(System.Windows.TextDataFormat.Html))
                    {
                        htmlString = System.Windows.Clipboard.GetText(System.Windows.TextDataFormat.Html);
                    }
                });
            STAThread.SetApartmentState(ApartmentState.STA);
            STAThread.Start();
            STAThread.Join();

            return htmlString;
        }

    }
}

When I enter your formatted text in an Excel cell, copy it with Ctrl-C, and then retrieve the value with this method, the result is an HTML representation of your text:

Version:1.0
StartHTML:0000000148
EndHTML:0000002905
StartFragment:0000002319
EndFragment:0000002845
SourceURL:file:///C:/Temp/SO72854821.xlsm

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/hamkchr/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/hamkchr/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>
<!--table
    {mso-displayed-decimal-separator:"\,";
    mso-displayed-thousand-separator:"\.";}
@page
    {margin:.79in .7in .79in .7in;
    mso-header-margin:.3in;
    mso-footer-margin:.3in;}
.font0
    {color:black;
    font-size:10.0pt;
    font-weight:400;
    font-style:normal;
    text-decoration:none;
    font-family:Arial, sans-serif;
    mso-font-charset:0;}
.font5
    {color:black;
    font-size:10.0pt;
    font-weight:700;
    font-style:normal;
    text-decoration:none;
    font-family:Arial, sans-serif;
    mso-font-charset:0;}
.font6
    {color:black;
    font-size:10.0pt;
    font-weight:400;
    font-style:italic;
    text-decoration:none;
    font-family:Arial, sans-serif;
    mso-font-charset:0;}
tr
    {mso-height-source:auto;}
col
    {mso-width-source:auto;}
br
    {mso-data-placement:same-cell;}
td
    {padding-top:1px;
    padding-right:1px;
    padding-left:1px;
    mso-ignore:padding;
    color:black;
    font-size:10.0pt;
    font-weight:400;
    font-style:normal;
    text-decoration:none;
    font-family:Arial, sans-serif;
    mso-font-charset:0;
    mso-number-format:General;
    text-align:general;
    vertical-align:bottom;
    border:none;
    mso-background-source:auto;
    mso-pattern:auto;
    mso-protection:locked visible;
    white-space:nowrap;
    mso-rotate:0;}
.xl65
    {white-space:normal;}
-->
</style>
</head>

<body link="#0563C1" vlink="#954F72">

<table border=0 cellpadding=0 cellspacing=0 width=470 style='border-collapse:
 collapse;width:353pt'>
 <col width=470 style='mso-width-source:userset;mso-width-alt:17188;width:353pt'>
 <tr height=119 style='height:89.25pt'>
<!--StartFragment-->
  <td height=119 class=xl65 width=470 style='height:89.25pt;width:353pt'><font
  >Vestibulum eget viverra nisi.</font><font ><br>
    <br>
    Maecenas non aliquet dui. Maecenas varius, ante vel pharetra porta, augue
  lectus accumsan risus, non dapibus orci leo a erat.<br>
    <br>
    </font><font >Suspendisse varius nisi quis metus semper
  dapibus</font><font >. Cras ullamcorper iaculis tortor eget
  rhoncus. Integer hendrerit vulputate felis</font></td>
<!--EndFragment-->
 </tr>
</table>

</body>

</html>

I have been unable to find a direct way to get this result from the clipboard in VBA for Excel, but I can see the following (admittedly fairly complicated) path to complete the job:

Then, in VBA, in a loop you could define a Range object for each the cells you want to convert, select the cell and copy it, then retrieve the HTML using the c# function and do whatever you need with it.

  • Related