Home > Blockchain >  Remove duplicate numbers separated by a symbol in a string using Hive's REGEXP_REPLACE
Remove duplicate numbers separated by a symbol in a string using Hive's REGEXP_REPLACE

Time:01-31

I have a spark dataframe with a string column that includes numbers separated by ;, for example: 862;1595;17;862;49;862;19;100;17;49, I would like to remove the duplicated numbers, leaving the following: 862;1595;17;49;19;100

As far as patterns go I have tried

  1. "\\b(\\d (?:\\.\\d )?) ([^;] ); (?=.*\\b\\1 \\2\\b)
  2. (?<=\b\1:.*)\b(\w ):?
  3. \\b( )\\b(?=.*?\\b\1\\b)
  4. (\b[^,] )(?=.*, *\1(?:,|$)), *

But nothing has yielded what I need thus far.

CodePudding user response:

Try the following query (to replace duplicate numbers in a string column):

SELECT  regexp_replace
        (
            your_column,
            '(?<=^|;)(?<num>.*?);(?=.*(?<=;)\\k<num>(?=;|$))',
            ''
        )

FROM table;
  • Related