How to map functions across objects/arrays in SQL?-CodePudding

Say I have some data in table, which has the names and ages of customers who came into a store each day (the id functions as a unique day identifier).

id, obj
-----------------------------------------------------------------
1,  [{'name': 'jenny', 'age':22}, {'name': 'adam', 'age':32},...]
2,  [{'name': 'adam',  'age':32}, {'name': 'beth', 'age':27},...]

What I'd like to do is find the average age per day

SELECT id, AVG([o['age'] for o in obj]) AS avg_age
FROM t

And the desired output

id, avg_age
--, -------
1,  27
2,  29.2

This obviously doesn't work because the python-like list comprehension doesn't work in SQL. What is the proper way to do this in SQL?

CodePudding user response：

Assuming your data is actually json you can manipulate it:

-- sample data
WITH dataset (id, obj) AS (
    VALUES ( 1, '[{"name": "jenny", "age":22}, {"name": "adam", "age":32}]'),
        (2, '[{"name": "adam",  "age":32}, {"name": "beth", "age":27}]')
) 

--query
select id,
    reduce(arr, 0, (s, x) -> s   x, s -> s) / cardinality(arr)
from
    (
        select id,
            transform(
                cast(json_parse(obj) as array(json)), -- parse json and cast to array of json
                j->cast(json_extract_scalar(j, '$.age') as double) -- extract age
            ) arr
        from dataset
    )

Output:

id	_col1
1	27.0
2	29.5

For newer versions you can try replace outer select and reduce with array_average:

select id,
    array_average(transform(
        cast(json_parse(obj) as array(json)),
        j->cast(json_extract_scalar(j, '$.age') as double)
    )) arr
from dataset