I have a table that is similar to the following structure.
ID | LOG |
---|---|
1 | Begin run 'clean_data' |
1 | Task 1 success |
1 | Task 2 success |
2 | Begin run 'store_data' |
2 | Task 1 success |
2 | Task 2 success |
Each run is split into multiple tasks as shown above. I want to create another column in the table with the name of the run for each task that belongs to that run (even just the full first entry for each run would be fine).
i.e
ID | LOG | NAME |
---|---|---|
1 | Begin run 'clean_data' | clean_data |
1 | Task 1 success | clean_data |
1 | Task 2 success | clean_data |
2 | Begin run 'store_data' | store_data |
2 | Task 1 success | store_data |
2 | Task 2 success | store_data |
I currently have the following window function that will get me the first and last row for each run so I know the bounds for each run, but I'm sure this is not the right way to go about it.
select ID, LOG
from (select t.*,
row_number() over (partition by ID order by LOG) as seqnum_asc,
row_number() over (partition by ID order by LOG desc) as seqnum_desc
from combined t
) t
where 1 in (seqnum_asc, seqnum_desc)
order by ID, LOG asc;
Thanks in advance
CodePudding user response:
Try building a helper table, using REGEXP_SUBSTR()
with lookahead and lookback patterns consisting of single quotes, to get "anything within the string that comes after a single quote and before another single quote.
Then, use the helper table for a MERGE statement.
CREATE TABLE indata(id,log,nam)
AS
SELECT 1,'Begin run ''clean_data''' , NULL::VARCHAR(16)
UNION ALL SELECT 1,'Task 1 success' , NULL::VARCHAR(16)
UNION ALL SELECT 1,'Task 2 success' , NULL::VARCHAR(16)
UNION ALL SELECT 2,'Begin run ''store_data''' , NULL::VARCHAR(16)
UNION ALL SELECT 2,'Task 1 success' , NULL::VARCHAR(16)
UNION ALL SELECT 2,'Task 2 success' , NULL::VARCHAR(16)
;
CREATE TABLE updh
AS
SELECT
id
, log
, REGEXP_SUBSTR(log,'(?<=''). (?='')') AS runname
FROM indata
WHERE REGEXP_SUBSTR(log,'(?<=''). (?='')') <> ''
-- test output ...
-- out id | log | runname
-- out ---- ------------------------ ------------
-- out 1 | Begin run 'clean_data' | clean_data
-- out 2 | Begin run 'store_data' | store_data
;
MERGE INTO indata
USING updh
ON updh.id=indata.id
WHEN MATCHED THEN UPDATE
SET nam=runname
;
SELECT * FROM indata;
-- out id | log | nam
-- out ---- ------------------------ ------------
-- out 1 | Begin run 'clean_data' | clean_data
-- out 1 | Task 1 success | clean_data
-- out 1 | Task 2 success | clean_data
-- out 2 | Begin run 'store_data' | store_data
-- out 2 | Task 1 success | store_data
-- out 2 | Task 2 success | store_data
CodePudding user response:
You can use a self-join
with regexp_substr
:
select l.*, regexp_replace(regexp_substr(l1.log, "'\\w '$"), "'", '') from logs l
join logs l1 on l1.id = l.id and l1.log regexp "'\\w '$"