I have 2 tables: table1 (including id, title, and abstract) and table2 (including id, pubdate, and family_id).
Each id is unique but multiple rows can have the same family_id.
My goal is to display the id, title and pubdate only for the MIN(pubdate) of each family_id.
I tried this:
SELECT
t1.id, t1.title, nt2.pubdate
FROM
(SELECT
id, family_id, MIN(pubdate) AS pubdate
FROM
table2
GROUP BY
family_id) AS nt2
INNER JOIN
table1 t1
ON
t1.id = nt2.id
I get the following error message in BigQuery: "SELECT list expression references column id which is neither grouped nor aggregated at [position]".
I read here that this is due to the fact that all expressions in the respective SELECT list must be either those which are in the GROUP BY or should be with AGGREGATION function.
But, I do need id for merging with table1.
How can I do this?
CodePudding user response:
From your description, seems that you can have one id
with multiple pubdate
in table2
then, you have to pick the id
associated with the oldest pubdate
adding it into the group by
clause and it won't affect the id.
try
SELECT
t1.id, t1.title, nt2.pubdate
FROM
(SELECT
id, family_id, MIN(pubdate) AS pubdate
FROM
table2
GROUP BY
id, family_id) AS nt2
INNER JOIN
table1 t1
ON
t1.id = nt2.id
and you will have your oldest id
, title
and pubdate
for each family_id
(the latter won't appear in your final table as long as you do not put it in the select clause)
CodePudding user response:
In the “Group by” statement, you need to write all the fields you use in the “SELECT” statement. You can see this documentation.
Like this line of code:
GROUP BY Family_id,id
In this case, if the family and id are duplicated in another field, they will group.
Here is an example code:
SELECT
t1.id, t1.title, nt2.pubdate
FROM
(
SELECT
id, family_id, MIN(pubdate) AS pubdate
FROM
Table2 t2
GROUP BY
family_id,id
) AS nt2
Table1 t1
ON
t1.id = nt2.id
order by id asc