We are working on converting Informatica mappings to Google Bigquery SQL. In one of the mappings, there are a couple ports/columns, say A and B which are not getting grouped by in the Aggregator transformation and neither have been applied any aggregation function like sum, avg etc. According to senior devs in my org, in Informatica, we will get last values of these ports/columns as a result after the aggregator. My question is, how do we convert this behaviour in BigQuery SQL? Because we cannot use that columns in select statement, which are not present in the Group by clause and we don't want to group by these columns. For getting last value of the column, we have LAST_VALUE() analytic function in bigquery, but even then we cannot use the group by and analytic function in same select statement. I would really appreciate some help!
CodePudding user response:
Use some aggregation function.
In Informatica you will get LAST value. This is not deterministic. It basically means that either
- you have same values across all the column,
- you don't care which one you get, or
- you have specific order, on which the last value is taken.
First two cases mean you can use MIN / MAX / whatsoever. The result will be same or you don't care.
If the last one is your case, ARRAY_AGG should help you, as per this answer.
CodePudding user response:
to convert Infa mapping with aggregator to big SQL, I would use row_number over (partitioned by id order by id) as rn
and then in outside put a filter rn=1.
Informatica aggregator - id is group by column.
Equivalent SQL should look like this -
select a,b,id
from
(select a,b,row_number over (partitioned by id order by id desc) as rn --this will mimic informatica aggregator. id column is the group by port. if you have any sorter before aggregator add all ports as per order in order by column on same sequence but reverse order(asc/desc)
from mytable) rs
where rs.rn=1 -- this will ensure to pick latest row.