Home > Blockchain >  SQL: Divide long text in multiple rows
SQL: Divide long text in multiple rows

Time:06-17

I would like to divide a long text in multiple rows; there are other questions similar to this one but none of them worked for me. What I have

ID | Message
----------------------------------
1  | Very looooooooooooooooong text
2  | Short text

What I would like to do is divide that string every n characters Result if n = 15:

Id                    | Message
------------------------------------------
1                     | Very looooooooo
1                     | oooooooong text
2                     | Short text

Even better if the split is done at the first space after n character.

I tried with string_split and substring but I cannot find anything that works. I thought to use something similar to this:

SELECT index, element FROM table, CAST(message AS SUPER) AS element AT index;

But it doesn't take into account the length and I don't like casting a varchar variable into a super.

CodePudding user response:

You can use generate_series() to accomplish this:

  select m.*, gs.posn, substring(m.message, gs.posn, 15) as split_message 
    from messages m
         cross join lateral generate_series(1, length(message), 15) gs(posn);

Splitting on spaces after the length is a little trickier. We would have to split the message into words and then figure out how to break them into groups and then reaggregate.

I could not figure out how to split on spaces without recursion. I hope you don't mind that it treats all whitespace as word boundaries:

with recursive by_words as (
  select m.*, s.n, s.word, length(s.word) as word_len,
         max(s.n) over (partition by m.id) as num_words
    from messages m
         cross join lateral regexp_split_to_table(m.message, '\s ') 
           with ordinality as s(word, n)
), rejoin as (
  select id, n, array[word] as words, word_len as cum_word_len, 
         word_len >= 15 as keep 
    from by_words
   where n = 1
  union all
  select p.id, c.n, 
         case 
           when p.cum_word_len >= 15 then array[c.word]
           else p.words||c.word
         end as words, 
         case 
           when p.cum_word_len >= 15 then c.word_len
           else p.cum_word_len   c.word_len   1
         end as cum_word_len,
         (p.cum_word_len   c.word_len   1 >= 15)
           or (c.n = c.num_words) as keep
    from rejoin p
         join by_words c on (c.id, c.n) = (p.id, p.n   1)
)
select id, 
       row_number() over (partition by id
                              order by n) as segnum,
       array_to_string(words, ' ') as split_message
  from rejoin 
 where keep
 order by 1, 2
;

db<>fiddle here

  • Related