How to remove duplicate entries from json string in SQL Server-CodePudding

I've checked stackoverflow as well as google but could not find any solution. What would be the proper way to remove duplicate entries from an nvarchar field that contains json string in SQL Server? For my case, let say I have nvarchar 'People' field on my table which contains the following data.

[
 {
  "name":"Jon",
  "age": 30
 },
 {
  "name":"Bob",
  "age": 30
 },
 {
  "name":"Nick",
  "age": 40
 },
 {
  "name":"Bob",
  "age": 40
 }
]

I need to remove the entries which has duplicate names which would be the 'Bob' in that case. So after executing the query I am expecting this result

[
 {
  "name":"Jon",
  "age": 30
 },
 {
  "name":"Bob",
  "age": 30
 },
 {
  "name":"Nick",
  "age": 40
 }
]

What would be the proper sql query to do that? Actually I am trying to achieve no duplicate names rather than no duplicate entries. That's why 2 Bobs have different ages in the above example. More specifically I need to keep only first items among duplicates for this example the first Bob with age 30.Using ROW_NUMBER() and Partition By would be solution but it breaks the existing order.I need to achieve this without breaking the existing order. So I have the table with Id and PeopleJson fields. The following query would achieve what I want to achieve but it breaks the order in PeopleJson

SELECT Id, (
    SELECT [Name],[Age] FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY (select NULL)) as row_num
        FROM OPENJSON(PeopleJson) WITH ([Name] NVARCHAR(1000), [Age]  int)
    ) t WHERE t.row_num = 1
    FOR JSON PATH, INCLUDE_NULL_VALUES
) as [People]
 From [TestTable]

CodePudding user response：

WITH cte AS (
    SELECT DISTINCT value
    FROM OPENJSON(@json)
)
SELECT * FROM cte

DECLARE @json NVARCHAR(MAX) = N'[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, {"id": 1, "name": "Alice"}]'

SELECT (
    SELECT *
    FROM (
        SELECT DISTINCT value
        FROM OPENJSON(@json)
    ) cte
    FOR JSON AUTO
) AS result

[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"}]

CodePudding user response：

Can you provide some more information, please? I understand what you're trying to do, but I have some questions.

Each Bob has a different age, so those aren't duplicate entries, only duplicate names. Either way, it would be hard to decide which entry to remove if each one is different.

You can achieve no duplicate "Bob" entries, but the issue comes in when deciding which Bob record you want to keep.