Home > Net >  Create column of new fields from daily fields in SQL Presto
Create column of new fields from daily fields in SQL Presto

Time:12-13

I would like to calculate the new visitors based on all the visitors that visit each day. Right now the only data I have available is the first 2 columns, so I need to extrapolate the last 2 columns based on the first 2 columns.

Date Visitors_Today New_Visitors All_Visitors_To_Date
Dec 6 Allie, Jon Allie, Jon Allie, Jon
Dec 7 Allie, Jon, Zach Zach Allie, Jon, Zach
Dec 8 Barb, Jon Barb Allie, Barb, Jon, Zach
Dec 9 Janet, Zach Janet Allie, Barb, Janet, Jon, Zach

This is what I have so far to create the first and second column

WITH visitor_log_response AS (
    SELECT
        CAST(JSON_PARSE(visitor_log) AS MAP<VARCHAR, VARCHAR>) AS visitor_map,
        date
    FROM visitor_log_response_table
),
names_and_dates AS (
    SELECT DISTINCT
        visitor_name AS visitor_name,
        date
    FROM visitor_log_response 
    CROSS JOIN UNNEST(visitor_map) AS u(visitor_name, visitor_age)
),
visitor_names AS (
    SELECT
        date,
        ARRAY_JOIN(
            ARRAY_AGG(
                visitor_name
                ORDER BY
                    visitor_name
            ),
            ','
        ) visitors_today,
    FROM names_and_dates
    GROUP BY
        date
    ORDER BY
        date DESC
)
SELECT
    date,
    visitors_today
FROM visitor_names

Which results in this

Date Visitors_Today
Dec 6 Allie, Jon
Dec 7 Allie, Jon, Zach
Dec 8 Barb, Jon
Dec 9 Janet, Zach

If the table is normalized using this query

SELECT ds, visitors_today_split
FROM previous_table
CROSS JOIN UNNEST(SPLIT(visitors_today),',')) as (visitors_today_split)

I would have this output

Date Visitors_Today
Dec 6 Allie
Dec 6 Jon
Dec 7 Allie
Dec 7 Jon
Dec 7 Zach
Dec 8 Barb
Dec 8 Jon
Dec 9 Janet
Dec 9 Zach

CodePudding user response:

You can flatten your arrays and perform new aggregations in subqueries:

select t.*, (select array_join(array_agg(t1.v), ', ') from 
  (select v from unnest(t.visitors_today) v
   except
   select v from tbl t2 cross join unnest(t2.visitors_today) v 
   where t2.date < t.date) t1),
  (select array_join(array_distinct(array_agg(v)), ', ') 
   from tbl t1 cross join unnest(t1.visitors_today)) v 
   where t1.date <= t.date)
from tbl t

See fiddle (Demo of query above in Postgres).

CodePudding user response:

You can use window functions with array aggregation (remove ARRAY_JOIN from visitor_names CTE):

-- sample data
with dataset(date, visitors_today) as (
    values ('Dec 6', array['Allie', 'Jon']),
        ('Dec 7', array['Allie', 'Jon', 'Zach']),
        ('Dec 8', array['Barb', 'Jon']),
        ('Dec 9', array['Janet', 'Zach'])
)

-- query
select date,
    visitors_today,
    array_distinct(visitors_today || prev_visitors) all_visitors_to_date,
    array_except(visitors_today, prev_visitors) new_visitors    
from (
    select *,
           coalesce(
                flatten(array_distinct(array_agg(visitors_today)
                    over (order by date rows between UNBOUNDED PRECEDING and 1 PRECEDING))),
                array[]) as prev_visitors  -- combine all visitors before today into non null array
    from dataset);

Output:

date visitors_today all_visitors_to_date new_visitors
Dec 6 [Allie, Jon] [Allie, Jon] [Allie, Jon]
Dec 7 [Allie, Jon, Zach] [Allie, Jon, Zach] [Zach]
Dec 8 [Barb, Jon] [Barb, Jon, Allie, Zach] [Barb]
Dec 9 [Janet, Zach] [Janet, Zach, Allie, Jon, Barb] [Janet]

Note that arrays can be not that optimal type in terms of performance and a limited to 10000 elements in Presto/Trino.

  • Related