Make window function output first value to all group rows-CodePudding

I have a table that is similar to the following structure.

ID	LOG
1	Begin run 'clean_data'
1	Task 1 success
1	Task 2 success
2	Begin run 'store_data'
2	Task 1 success
2	Task 2 success

Each run is split into multiple tasks as shown above. I want to create another column in the table with the name of the run for each task that belongs to that run (even just the full first entry for each run would be fine).

i.e

ID	LOG	NAME
1	Begin run 'clean_data'	clean_data
1	Task 1 success	clean_data
1	Task 2 success	clean_data
2	Begin run 'store_data'	store_data
2	Task 1 success	store_data
2	Task 2 success	store_data

I currently have the following window function that will get me the first and last row for each run so I know the bounds for each run, but I'm sure this is not the right way to go about it.

select ID, LOG
from (select t.*,
               row_number() over (partition by ID order by LOG) as seqnum_asc,
               row_number() over (partition by ID order by LOG desc) as seqnum_desc
      from combined t
     ) t
where 1 in (seqnum_asc, seqnum_desc)
order by ID, LOG asc;

Thanks in advance

CodePudding user response：

Try building a helper table, using REGEXP_SUBSTR() with lookahead and lookback patterns consisting of single quotes, to get "anything within the string that comes after a single quote and before another single quote.

Then, use the helper table for a MERGE statement.

CREATE TABLE indata(id,log,nam) 
AS 
          SELECT 1,'Begin run ''clean_data''' , NULL::VARCHAR(16)
UNION ALL SELECT 1,'Task 1 success'           , NULL::VARCHAR(16)
UNION ALL SELECT 1,'Task 2 success'           , NULL::VARCHAR(16)
UNION ALL SELECT 2,'Begin run ''store_data''' , NULL::VARCHAR(16)
UNION ALL SELECT 2,'Task 1 success'           , NULL::VARCHAR(16)
UNION ALL SELECT 2,'Task 2 success'           , NULL::VARCHAR(16)
;

CREATE TABLE updh 
AS 
  SELECT
    id
  , log
  , REGEXP_SUBSTR(log,'(?<=''). (?='')') AS runname
  FROM indata
  WHERE  REGEXP_SUBSTR(log,'(?<=''). (?='')') <> ''
  -- test output ...
  -- out  id |          log           |  runname   
  -- out ---- ------------------------ ------------
  -- out   1 | Begin run 'clean_data' | clean_data
  -- out   2 | Begin run 'store_data' | store_data
;

MERGE INTO indata
USING updh
  ON updh.id=indata.id
WHEN MATCHED THEN UPDATE 
 SET nam=runname
;
SELECT * FROM indata;
-- out  id |          log           |    nam     
-- out ---- ------------------------ ------------
-- out   1 | Begin run 'clean_data' | clean_data
-- out   1 | Task 1 success         | clean_data
-- out   1 | Task 2 success         | clean_data
-- out   2 | Begin run 'store_data' | store_data
-- out   2 | Task 1 success         | store_data
-- out   2 | Task 2 success         | store_data

CodePudding user response：

You can use a self-join with regexp_substr:

select l.*, regexp_replace(regexp_substr(l1.log, "'\\w '$"), "'", '') from logs l 
join logs l1 on l1.id = l.id and l1.log regexp "'\\w '$"