I have data like this: a35={"abc":"D1,9,12, 23, 24, 25, 26"}
on a column.
What I want to do is, using Presto, extract only the id (on this case a35
) on a new column if and only if I have "abc":"D1
on the beginning of the json.
So far I am trying to make this regex work for extracting the id, but no success, nothing returns from my query...
--sample data
WITH dataset(id_str) AS (
SELECT ('a35={"abc":"D1,9,12, 23, 24, 25, 26"}')
)
--query
SELECT regexp_extract_all(id_str, '"\b(?<id>\w{3})\=\{\"abc\"\:\"D1\,"')
FROM dataset;
Any ideas?
CodePudding user response:
A completely overkill method for this use case, but it might be useful for more advanced applications:
WITH dataset(id_str) AS (
SELECT ('a35={"abc":"D1,9,12, 23, 24, 25, 26"}')
)
SELECT map_keys(
map_filter(
CAST(
json_parse(regexp_replace(id_str, '(\d|\w*)=(.*)', '{"$1":$2}') -- Converts string to valid JSON.
) as MAP<VARCHAR, JSON>) -- Converts JSON to MAP of JSON
, (k,v) -> starts_with(CAST(json_extract(v, '$.abc') AS VARCHAR), 'D1')) -- Filters only the desired values
) as keys -- extracts the keys.
FROM dataset
Output:
keys |
---|
{a35} |
CodePudding user response:
Actually the answer was correct, I just had two quotes more than I should on the regex. The correct answer is:
--sample data
WITH dataset(id_str) AS (
SELECT ('a35={"abc":"D1,9,12, 23, 24, 25, 26"}')
)
--query
SELECT regexp_extract_all(id_str, '\b(?<id>\w{3})\=\{\"abc\"\:\"D1\,')
FROM dataset;
For reference, I've validated the regex using: https://regex101.com/