Regex to remove HTML Tags, empty lines and blank spaces in sql query-CodePudding

I have a table that has a column feeback which is a free text from front end. This column has values like -

FEEDBACK
-Agent was listening and very attentive.

Agent showed all the houses and gave the right description

Agent was well versed & knew how to speak multiple
languages





&nbsp;
-<p>Agent was well dressed for the event</p>

Since this is copy pasted, there are many spaces or empty lines between two lines sometiems that comes in the backend.

I want to remove all these and show the output like -

FEEDBACK
-Agent was listening and very attentive.
Agent showed all the houses and gave the right description
Agent was well versed & knew how to speak multiple
languages
-Agent was well dressed for the event

For this I use the below query -

select REGEXP_REPLACE(regexp_replace(  regexp_replace(
    regexp_replace(
      DBMS_LOB.SUBSTR(max(feedback),4000),
      /*
        Replace LF followed by any non-printable sequence that ends with newline
        with single newline
      */
      chr(10) || '[^[:graph:]]*(' || chr(13) || '?' || chr(10) || ')',
      chr(10) || '\1'
    ),
    /*Then replace newline repetitions*/
    '(' || chr(13) || '?' || chr(10) || ') ',
    '\1'
  ),'<.*?>'),'&nbsp;') as feedback
  from dual;

Is there any way I can merge these regex_replace and not use multiple regex_replace to cater to my requirement?

CodePudding user response：

Not all can be combined.
But some can, via a regex OR |

And better replace those first then.
As removing them could cause extra empty lines.

SELECT
  REGEXP_REPLACE(
    REGEXP_REPLACE(
      REGEXP_REPLACE(DBMS_LOB.SUBSTR(feedback, 4000)
      , '(&nbsp;)|(<[/[:alpha:]] >)')
      , '[[:space:]] $','',1,0,'m') 
      , '(['||chr(13)||']?['||chr(10)||']){2,}','\1') AS feedback
FROM your_table
ORDER BY feedback DESC
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY