Say I have some data in table, which has the names and ages of customers who came into a store each day (the id functions as a unique day identifier).
id, obj
-----------------------------------------------------------------
1, [{'name': 'jenny', 'age':22}, {'name': 'adam', 'age':32},...]
2, [{'name': 'adam', 'age':32}, {'name': 'beth', 'age':27},...]
What I'd like to do is find the average age per day
SELECT id, AVG([o['age'] for o in obj]) AS avg_age
FROM t
And the desired output
id, avg_age
--, -------
1, 27
2, 29.2
This obviously doesn't work because the python-like list comprehension doesn't work in SQL. What is the proper way to do this in SQL?
CodePudding user response:
Assuming your data is actually json you can manipulate it:
-- sample data
WITH dataset (id, obj) AS (
VALUES ( 1, '[{"name": "jenny", "age":22}, {"name": "adam", "age":32}]'),
(2, '[{"name": "adam", "age":32}, {"name": "beth", "age":27}]')
)
--query
select id,
reduce(arr, 0, (s, x) -> s x, s -> s) / cardinality(arr)
from
(
select id,
transform(
cast(json_parse(obj) as array(json)), -- parse json and cast to array of json
j->cast(json_extract_scalar(j, '$.age') as double) -- extract age
) arr
from dataset
)
Output:
id | _col1 |
---|---|
1 | 27.0 |
2 | 29.5 |
For newer versions you can try replace outer select and reduce with array_average
:
select id,
array_average(transform(
cast(json_parse(obj) as array(json)),
j->cast(json_extract_scalar(j, '$.age') as double)
)) arr
from dataset