Keep variables in the SELECT list which are not used in the GROUP BY nor aggregated

I have 2 tables: table1 (including id, title, and abstract) and table2 (including id, pubdate, and family_id).

Each id is unique but multiple rows can have the same family_id.

My goal is to display the id, title and pubdate only for the MIN(pubdate) of each family_id.

I tried this:

SELECT
  t1.id, t1.title, nt2.pubdate
FROM
  (SELECT
     id, family_id, MIN(pubdate) AS pubdate
   FROM
     table2
   GROUP BY
     family_id) AS nt2
INNER JOIN
  table1 t1
ON
  t1.id = nt2.id

I get the following error message in BigQuery: "SELECT list expression references column id which is neither grouped nor aggregated at [position]".

I read here that this is due to the fact that all expressions in the respective SELECT list must be either those which are in the GROUP BY or should be with AGGREGATION function.

But, I do need id for merging with table1.

How can I do this?

CodePudding user response：

From your description, seems that you can have one id with multiple pubdate in table2 then, you have to pick the id associated with the oldest pubdate adding it into the group by clause and it won't affect the id.
try

SELECT
  t1.id, t1.title, nt2.pubdate
FROM
  (SELECT
     id, family_id, MIN(pubdate) AS pubdate
   FROM
     table2
   GROUP BY
     id, family_id) AS nt2
INNER JOIN
  table1 t1
ON
  t1.id = nt2.id

and you will have your oldest id, title and pubdate for each family_id (the latter won't appear in your final table as long as you do not put it in the select clause)

CodePudding user response：

In the “Group by” statement, you need to write all the fields you use in the “SELECT” statement. You can see this documentation.

Like this line of code:

GROUP BY Family_id,id

In this case, if the family and id are duplicated in another field, they will group.

Here is an example code:

SELECT
 t1.id, t1.title, nt2.pubdate
FROM
 (
     SELECT
    id, family_id, MIN(pubdate) AS pubdate
  FROM
    Table2 t2
  GROUP BY
    family_id,id
    ) AS nt2
Table1 t1
ON
 t1.id = nt2.id
 order by id asc