Home > Software design >  How to extract every bold words in a html text
How to extract every bold words in a html text

Time:03-29

I have a text like that.

<div>
    blabla <b> BOLD </b> babla <b> BOLD2</b> blabla
    <table width="100%">
    <tr>
        <td valign="top" style="width:30px">-</td>
       <td>blabla <b>BOLD3</b></td>
  </tr> 
</div>


I want to extract the word between the b tag. But there is several path.

SELECT  t1,t2
       FROM mytext,
            XMLTABLE (
                '/div'
                PASSING xmltype (raw_text.inhalt)
                COLUMNS t2     VARCHAR2 (1000) PATH './b',              --
                        t1     VARCHAR2 (1000) PATH './table/tr/td/b'))

[Error] Execution (11: 1): ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence

CodePudding user response:

Assuming that your HTML is valid XML (which is not always the case as it needs a single root node and all opening tags to be closed) then given the sample data:

CREATE TABLE mytext (rawtext  CLOB);

INSERT INTO mytext (rawtext) VALUES (
'<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/TR/xhtml1/strict" >
  <div>
    blabla <b> BOLD </b> babla <b> BOLD2</b> blabla
    <table width="100%">
      <tr>
        <td valign="top" style="width:30px">-</td>
        <td>blabla <b>BOLD3</b></td>
      </tr>
    </table>
  </div>
</html>'
);

Then you can use:

SELECT bold
FROM   mytext m
       CROSS JOIN XMLTABLE (
         XMLNAMESPACES(DEFAULT 'http://www.w3.org/TR/xhtml1/strict'),
         '//b'
         PASSING XMLTYPE(m.rawtext)
         COLUMNS bold VARCHAR2(1000) PATH './text()'
       )

Which outputs:

BOLD
BOLD
BOLD2
BOLD3

db<>fiddle here

CodePudding user response:

This version works too. And you don't have to add more html markup. You only have to remove XMLNAMESPACES(DEFAULT 'http://www.w3.org/TR/xhtml1/strict')

CREATE TABLE mytext (
  rawtext  CLOB
);

INSERT INTO mytext (rawtext) VALUES (
'<div>
    blabla <b> BOLD </b> babla <b> BOLD2</b> blabla
    <table width="100%">
      <tr>
        <td valign="top" style="width:30px">-</td>
        <td>blabla <b>BOLD3</b></td>
      </tr>
    </table>
  </div>'
);

SELECT bold
FROM   mytext m
       CROSS JOIN XMLTABLE (
         '//b'
         PASSING XMLTYPE(m.rawtext)
         COLUMNS bold VARCHAR2(1000) PATH './text()'
       )

https://dbfiddle.uk/?rdbms=oracle_21&fiddle=0889004081eb0c86bf3b869d953cf04e

  • Related