Home > Mobile >  Query with Left outer join and group by returning duplicates
Query with Left outer join and group by returning duplicates

Time:01-05

To begin with, I have a table in my db that is fed with SalesForce info. When I run this example query it returns 2 rows:

select * from SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513'

result of the above query

When I run this next query on the same table I obtain one of the rows, which is what I need:

SELECT MAX(ID_SAP_BAYER__c) FROM  SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513' GROUP BY ID_SAP_BAYER__c

result of the above query

Now, I have another table (PedidosEspecialesZarateCabeceras) which has a field (NroClienteDireccionEntrega) that I can match with the field I've been using in the SalesForce table (ID_SAP_BAYER__c). This table has a key that consists of just 1 field (NroPedido).

What I need to do is join these 2 tables to obtain a row from PedidosEspecialesZarateCabeceras with additional fields coming from the SalesForce table, and in case those additional fields are not available, they should come as NULL values, so for that im using a LEFT OUTER JOIN.

The problem is, since I have to match NroClienteDireccionEntrega and ID_SAP_BAYER__c and there's 2 rows in the salesforce table with the same ID_SAP_BAYER__c, my query returns 2 duplicate rows from PedidosEspecialesZarateCabeceras (They both have the same NroPedido).

This is an example query that returns duplicates:

SELECT 
cab.CUIT AS CUIT, 
convert(nvarchar(4000), cab.NroPedido) AS NroPedido,

sales.BillingCity__c as Localidad,
sales.BillingState__c as IdProvincia,
sales.BillingState__c_Desc as Provincia,
sales.BillingStreet__c as Calle,
sales.Billing_Department__c as Distrito,
sales.Name as RazonSocial,

cab.NroCliente as ClienteId

FROM   PedidosEspecialesZarateCabeceras AS cab WITH (NOLOCK) 
             
     
                  LEFT OUTER JOIN

                         SalesForce_INT_Account__c AS sales WITH (NOLOCK) ON 
                         cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c 
                         and sales.ID_SAP_BAYER__c in
                                 ( SELECT MAX(ID_SAP_BAYER__c)

                                            FROM  SalesForce_INT_Account__c 
                                         GROUP BY ID_SAP_BAYER__c
                               )
                                 
WHERE cab.NroPedido ='5320'

Even though the join has MAX and Group By, this returns 2 duplicate rows with different SalesForce information (Because of the 2 salesforce rows with the same ID_SAP_BAYER__c), which should not be possible.

duplicate rows with different salesforce information

What I need is for the left outer join in my query to pick only ONE of the salesforce rows to prevent duplication like its happening right now. For some reason the select max with the group by is not working.

Maybe I should try to join this tables in a different way, can anyone give me some other ideas on how to join the two tables to return just 1 row? It doesnt matter if the SalesForce row that gets picked out of the 2 isn't the correct one, I just need it to pick one of them.

CodePudding user response:

Your IN clause is not actually doing anything, since...

SELECT MAX(ID_SAP_BAYER__c)
FROM  SalesForce_INT_Account__c 
GROUP BY ID_SAP_BAYER__c

... returns all possible IDSAP_BAYER__c values. (The GROUP BY says you want to return one row per unique ID_SAP_BAYER__c and then, since your MAX is operating on exactly one unique value per group, you simply return that value.)

You will want to change your query to operate on a value that is actually different between the two rows you are trying to differentiate (probably the MAX(ID) for the relevant ID_SAP_BAYER__c). Plus, you will want to link that inner query to your outer query.

You could probably do something like:

...
LEFT OUTER JOIN
SalesForce_INT_Account__c sales
 ON cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c 
 and sales.ID in
 ( 
 SELECT MAX(ID)
 FROM SalesForce_INT_Account__c sales2
 WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega
 )                               
WHERE cab.NroPedido ='5320'

By using sales.ID in ... SELECT MAX(ID) ... instead of sales.ID_SAP_BAYER__c in ... SELECT MAX(ID_SAP_BAYER__c) ... this ensures you only match one of the two rows for that ID_SAP_BAYER__c. The WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega condition links the inner query to the outer query.

There are multiple ways of doing the above, especially if you don't care which of the relevant rows you match on. You can use the above as a starting point and make it match your preferred style.

An alternative might be to use OUTER APPLY with TOP 1. Something like:

SELECT 
 ...
FROM   PedidosEspecialesZarateCabeceras AS cab
OUTER APPLY(
 SELECT TOP 1 *
 FROM SalesForce_INT_Account__c s1
 WHERE cab.NroClienteDireccionEntrega = s1.ID_SAP_BAYER__c
) sales
WHERE cab.NroPedido ='5320'

Without an ORDER BY the match that TOP 1 chooses will be arbitrary, but I think that's what you want anyway. (If not, you could add an ORDER BY).

  • Related