How do I replace only a group of empty lines by an empty line?-CodePudding

I want to replace every line break which occur more than 3 time one after another by 2 line breaks.

example:

'

line

line after en empty line


line after 2 empty line
line after 0 empty lines
'

must become that

'
line

line after en empty line
  
line after 2 empty lines
line after 0 empty lines
'

I have tried that

SELECT REGEXP_REPLACE (mytext,'*(' || CHR (10) || ') ',  CHR (10)) FROM DUAL;

but it replace all groups of line breaks by 2 line breaks.

'
line

line after en empty line
  
line after 2 empty lines

line after 0 empty lines
'

I've tried that

SELECT REGEXP_REPLACE (mytext, '{2,}(' || CHR (10) || ') ',  CHR (10) || CHR (10)) FROM DUAL;

because of the documentation (https://www.techonthenet.com/oracle/functions/regexp_replace.php)

{m,} Matches at least m times.

but same result.

How do I replace only a group of empty lines by an empty line?

UPDATE: the proposed code

REGEXP_REPLACE (mytext, CHR(10) || '{2,}',  CHR(10) || CHR(10))

works for fast every situation but not for the following:

'




line after n empty lines
'

becomes

'
    

line after 2 empty lines
'

because the first line don't have chr(10) before.

I changed the pattern to match to take that into account:

SELECT REGEXP_REPLACE ((mytext, CHR(10) || '{2,}) | (' || CHR(10) || '{1,}),  CHR(10) || CHR(10)) FROM DUAL;

But I must add only a line break in this case. How do I write that ?

CodePudding user response：

I want to replace every line break which occur more than 3 time one after another by 2 line breaks.

You can replace instances of three-or-more newlines using:

SELECT REGEXP_REPLACE(value, CHR(10) || '{3,}', CHR(10) || CHR(10))
         AS replaced_value
FROM   table_name;

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'line

line after an empty line


line after 2 empty line
line after 0 empty lines'
FROM   DUAL;

Outputs:

REPLACED_VALUE

line

line after an empty line

line after 2 empty line
line after 0 empty lines

REPLACED_VALUE
line line after an empty line line after 2 empty line line after 0 empty lines

Note: You do not want to use CHR(10) || '{2,} as that would match instances of two newlines and replace them with two newlines, effectively using lots of effort to do nothing. Start matching from three newlines when there will be an effect.

db<>fiddle here

Update:

If you only want to match one fewer newline characters at the start of the string then you can use:

SELECT REGEXP_REPLACE(
         value,
         '((^|'||CHR(10)||')'||CHR(10)||')'||CHR(10)||' ',
         '\1'
       ) AS replaced_value
FROM   table_name;

Which will match either the start-of-the-string or a newline character then a newline character and put the value of that match into the first capturing group and then match one-or-more more newline characters and replace the entire match with the value of the first capturing group.

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'line

line after an empty line


line after 2 empty line
line after 0 empty lines'
FROM   DUAL UNION ALL
SELECT '




line after n empty lines
' FROM DUAL;

Outputs:

REPLACED_VALUE

line

line after an empty line

line after 2 empty line
line after 0 empty lines

line after n empty lines

REPLACED_VALUE
line line after an empty line line after 2 empty line line after 0 empty lines
line after n empty lines

db<>fiddle here

CodePudding user response：

You have the counter in the wrong place. It must be after the expression you want repeated, not before it.

You want to replace all line feeds that occur at least twice with a double line feed:

REGEXP_REPLACE (mytext, CHR(10) || '{2,}',  CHR(10) || CHR(10))

Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=f0051da5b892d1b0401c90530bb36e12

As to the first lines where it is one line feed less, the best option may be to add a line feed temporarily (i.e. you remove it after REGEXP_REPLACE with SUBSTR). So the final expression becomes:

SUBSTR(REGEXP_REPLACE (CHR(10) || mytext, CHR(10) || '{2,}',  CHR(10) || CHR(10)), 2)

Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=fd4909a01a2496e449104c35343ca3f1

CodePudding user response：

We need to use CHR(10){2,} to specify that we replace 2 or more occurances of newline with CHR(10) || CHR(10) that is 2 newlines.

create table test (col varchar(1000));

insert into test (col) values
('line

line after en empty line


line after 2 empty line
line after 0 empty lines');

select * from test;

line

line after en empty line


line after 2 empty line
line after 0 empty lines

SELECT REGEXP_REPLACE (col,CHR (10) || '{2,}', CHR (10) || CHR (10) ) FROM test;|

line

line after en empty line

line after 2 empty line
line after 0 empty lines

db<>fiddle here