I need to import and parse some sales records into the database.
Some customers have more than one product, therefore there are several orders with different product but with the same shipping address in the JSON
Array
.
Problem arise when I try to save the address in Addresses
Table:
Clearly I need just one record for each Address, and to achieve this I calculate an Hash
of the address fields and I compare it with the Hashes already present in the table:
It seems (well, I'm sure) that the query checks only the first time if the Hash
is already present, and if it is not present, it add as many rows as the orders count:
I can imagine this is the standard behavior of OPENJSON
function since the parsing of the JSON
payload seems to be made by an inner loop, therefore I think I have to use a different approach.... but I have no Idea what to use.
here the JSON
payload
declare @json nvarchar(max)=N'[
{
"id": 21660,
"currency": "USD",
"total": "15.00",
"shipping": {
"first_name": "Charles",
"last_name": "Leuschke",
"address_1": "3121 W Olive Ave",
"city": "Burbank",
"state": "CA",
"postcode": "91505",
"country": "US"
},
"line_items": [
{
"id": 1052
}
]
},
{
"id": 21659,
"currency": "USD",
"total": "38.00",
"shipping": {
"first_name": "Charles",
"last_name": "Leuschke",
"address_1": "3121 W Olive Ave",
"city": "Burbank",
"state": "CA",
"postcode": "91505",
"country": "US"
},
"line_items": [
{
"id": 1050
}
]
},
{
"id": 21658,
"currency": "USD",
"total": "38.00",
"shipping": {
"first_name": "Charles",
"last_name": "Leuschke",
"address_1": "3121 W Olive Ave",
"city": "Burbank",
"state": "CA",
"postcode": "91505",
"country": "US"
},
"line_items": [
{
"id": 1048
}
]
}
]'
and the (simplified) query
Insert Into @Addresses
(
orderId,
fullName,
addressLine1,
city,
stateOrProvince,
postalCode,
countryCode,
addressCode
)
SELECT
o.orderId,
concat(s.firstName,' ',s.lastName),
s.addressLine1,
s.city,
s.stateOrProvince,
s.postalCode,
s.countryCode,
convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName, ' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
FROM OPENJSON(@json)
WITH (
orderId nvarchar(64) '$.id',
shipping nvarchar(max) '$.shipping' AS JSON
) o
CROSS APPLY OPENJSON(shipping)
WITH (
firstName nvarchar(128) '$.first_name',
lastName nvarchar(128) '$.last_name',
addressLine1 nvarchar(128) '$.address_1',
city nvarchar(128) '$.city',
stateOrProvince nvarchar(64) '$.state',
postalCode nvarchar(64) '$.postcode',
countryCode nvarchar(4) '$.country'
) s
left join @Addresses a on a.addressCode=convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
where a.addressCode is null
I also prepared a sqlfiddle, where you can see that it returns 3 rows while target is to get just one
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=3031a99e3cd24f6bf383c0de29cf19a1
CodePudding user response:
Just use the standard ROW_NUMBER()
deduplication method e.g.
WITH cte AS (
SELECT
o.orderId,
concat(s.firstName,' ',s.lastName) fullName,
s.addressLine1,
s.city,
s.stateOrProvince,
s.postalCode,
s.countryCode,
h.addressCode,
ROW_NUMBER() OVER (PARTITION BY h.addressCode ORDER BY o.OrderId ASC) rn
FROM OPENJSON(@json)
WITH (
orderId nvarchar(64) '$.id',
shipping nvarchar(max) '$.shipping' AS JSON
) o
CROSS APPLY OPENJSON(shipping)
WITH (
firstName nvarchar(128) '$.first_name',
lastName nvarchar(128) '$.last_name',
addressLine1 nvarchar(128) '$.address_1',
city nvarchar(128) '$.city',
stateOrProvince nvarchar(64) '$.state',
postalCode nvarchar(64) '$.postcode',
countryCode nvarchar(4) '$.country'
) s
CROSS APPLY (
VALUES (CONVERT(nvarchar(64), HASHBYTES('SHA1', CONCAT(s.firstName ,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2))
) h (addressCode)
LEFT JOIN @Addresses a ON a.addressCode = h.AddressCode
WHERE a.addressCode IS NULL
)
INSERT INTO @Addresses
(
orderId,
fullName,
addressLine1,
city,
stateOrProvince,
postalCode,
countryCode,
addressCode
)
SELECT orderId,
fullName,
addressLine1,
city,
stateOrProvince,
postalCode,
countryCode,
addressCode
FROM cte
WHERE rn = 1;
Note: If you use CROSS APPLY
to calculate the hashcode one avoids calculating it multiple times.