Home > database >  OPENJSON does not detect duplicated Value
OPENJSON does not detect duplicated Value

Time:06-12

I need to import and parse some sales records into the database.
Some customers have more than one product, therefore there are several orders with different product but with the same shipping address in the JSON Array.

Problem arise when I try to save the address in Addresses Table:
Clearly I need just one record for each Address, and to achieve this I calculate an Hash of the address fields and I compare it with the Hashes already present in the table:

It seems (well, I'm sure) that the query checks only the first time if the Hash is already present, and if it is not present, it add as many rows as the orders count:

I can imagine this is the standard behavior of OPENJSON function since the parsing of the JSON payload seems to be made by an inner loop, therefore I think I have to use a different approach.... but I have no Idea what to use.

here the JSON payload

declare @json nvarchar(max)=N'[
   {
      "id": 21660,
      "currency": "USD",
      "total": "15.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1052
         }
      ]
   },
   {
      "id": 21659,
      "currency": "USD",
      "total": "38.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1050
         }
      ]
   },
   {
      "id": 21658,
      "currency": "USD",
      "total": "38.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1048
         }
      ]
   }
]'

and the (simplified) query

Insert Into @Addresses
(
    orderId,
    fullName,
    addressLine1,
    city,
    stateOrProvince,
    postalCode,
    countryCode,
    addressCode
)
SELECT
    o.orderId,
    concat(s.firstName,' ',s.lastName),
    s.addressLine1,
    s.city,
    s.stateOrProvince,
    s.postalCode,
    s.countryCode,
    convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName, ' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
FROM OPENJSON(@json) 
    WITH  (
            orderId                     nvarchar(64)    '$.id',
            shipping                    nvarchar(max)   '$.shipping' AS JSON
          ) o  
    CROSS APPLY OPENJSON(shipping) 
    WITH  (
            firstName                   nvarchar(128)   '$.first_name',
            lastName                    nvarchar(128)   '$.last_name',
            addressLine1                nvarchar(128)   '$.address_1',
            city                        nvarchar(128)   '$.city',
            stateOrProvince             nvarchar(64)    '$.state',
            postalCode                  nvarchar(64)    '$.postcode',
            countryCode                 nvarchar(4)     '$.country'
         ) s

left join @Addresses a on a.addressCode=convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
where a.addressCode is null

I also prepared a sqlfiddle, where you can see that it returns 3 rows while target is to get just one

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=3031a99e3cd24f6bf383c0de29cf19a1

CodePudding user response:

Just use the standard ROW_NUMBER() deduplication method e.g.


WITH cte AS (
    SELECT
    o.orderId,
    concat(s.firstName,' ',s.lastName) fullName,
    s.addressLine1,
    s.city,
    s.stateOrProvince,
    s.postalCode,
    s.countryCode,
    h.addressCode,
    ROW_NUMBER() OVER (PARTITION BY h.addressCode ORDER BY o.OrderId ASC) rn
    FROM OPENJSON(@json) 
    WITH  (
        orderId nvarchar(64) '$.id',
        shipping nvarchar(max) '$.shipping' AS JSON
    ) o  
    CROSS APPLY OPENJSON(shipping) 
    WITH  (
        firstName nvarchar(128) '$.first_name',
    lastName nvarchar(128) '$.last_name',
    addressLine1 nvarchar(128) '$.address_1',
    city nvarchar(128) '$.city',
    stateOrProvince nvarchar(64) '$.state',
    postalCode nvarchar(64) '$.postcode',
    countryCode nvarchar(4) '$.country'
    ) s
    CROSS APPLY (
      VALUES (CONVERT(nvarchar(64), HASHBYTES('SHA1', CONCAT(s.firstName ,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2))
    ) h (addressCode)
    LEFT JOIN @Addresses a ON a.addressCode = h.AddressCode
    WHERE a.addressCode IS NULL
)
INSERT INTO @Addresses
(
  orderId,
  fullName,
  addressLine1,
  city,
  stateOrProvince,
  postalCode,
  countryCode,
  addressCode
)
     SELECT orderId,
         fullName,
         addressLine1,
         city,
         stateOrProvince,
         postalCode,
         countryCode,
         addressCode
    FROM cte
    WHERE rn = 1;

Note: If you use CROSS APPLY to calculate the hashcode one avoids calculating it multiple times.

  • Related