I have following column in a postgres table.
col1 start end
p.[A138T;S160G;D221_E222delinsGK] 138 138
p.[A138T;S160G;D221_E222delinsGK] 160 160
p.[A138T;S160G;D221_E222delinsGK] 221 222
I am looking for a way to split the ; separated values in multiple rows. The expected output is:
col1 start end
p.A138T 138 138
p.S160G 160 160
p.D221_E222delinsGK 221 222
I am using following query but it does not works for the third row.
select
case
when start = "end" and col1 like 'p.[%' then 'p.'||(regexp_match(col1, '([A-Z]'||start||'[A-Z])'))[1]
when start != "end" and col1 like 'p.[%' then 'p.'||(regexp_match(col1, '[A-Z\d _delins] '||start||'[A-Z\d _delins] '))[1]
else col1,
start,
end
from table
Any help is highly appreciated!
CodePudding user response:
I prefer substring()
to extract strings based on a regex, so I don't have to deal with the array that regexp_match()
returns.
select case
when col1 like 'p.%' and start = "end"
then 'p.'||substring(col1 from concat('[A-Z]', start, '[A-Z]'))
when col1 like 'p.%' and start <> "end"
then 'p.'||substring(col1 from concat('[A-Z]', start, '_[A-Z]', "end", 'delins[^\]] '))
else col1
end as col1,
start,
"end"
from the_table;
As far as I can tell, the regex for the second alternative adds the _delins
incorrectly to the regex marking the "start" value. And if that is a constant value, it should not be part of a regex "range" because [delins]'
would also match the string sindel
. Not sure if you intended that.
Another option would be to simply take everything after the "end" marker and exclude the closing ]
by using
substring(col1 from concat('[A-Z]', start, '_[A-Z]', "end", '[^\]] '))