find all rows that have duplicate entries inside json column-CodePudding

I am stuck with a problem where I have a table with JSON column like this:

ID|VALUE
1 |{"a":"text1234","b":"default"}
2 |{"a":"text1234","b":"default"}
3 |{"a":"text1234","b":"text234"}
4 |{"a":"text1234","b":"default2"}
5 |{"a":"text1234","b":"default2"}

I would like to get all rows where value "b" is duplicate, so with the table above I would get rows 1,2,4,5.

I tried to group rows by value->b

$value_ids = ProductsAttributesValues::groupBy("value->b")->get();

but when i dd($value_ids) rows are not grouped by value->default. And I can't find a way to group them, so I can then count them. Or would there be a better way with doing this?

CodePudding user response：

Try the json_extract function:

select count(id) dup_count, json_extract(`value`,"$.b") as dup_value
from test
group by json_extract(`value`,"$.b") 
having dup_count>1
;
-- result set:
| dup_count | dup_value  |
 ----------- ------------ 
|         2 | "default"  |
|         2 | "default2" |

-- to get the id involved:
select id,dup_count,dup_value
from (select id,json_extract(`value`,"$.b") as dup_v
    from test) t1 
    join
    (select count(id) dup_count, json_extract(`value`,"$.b") as dup_value
    from test
    group by json_extract(`value`,"$.b") 
    having dup_count>1) t2
    on t1.dup_v=t2.dup_value
;
-- result set:
| id   | dup_count | dup_value  |
 ------ ----------- ------------ 
|    1 |         2 | "default"  |
|    2 |         2 | "default"  |
|    4 |         2 | "default2" |
|    5 |         2 | "default2" |

CodePudding user response：

Here is the queries that can do your task.

/*Extract value of "b" - Step 1*/
DROP TEMPORARY TABLE IF EXISTS d1;
CREATE TEMPORARY TABLE d1
SELECT 
    ID, `VALUE`, SUBSTR(VALUE FROM POSITION(',"b":' IN VALUE) 5 FOR 1000) AS v
FROM mytest
;

/*Extract value of "b" - Step 2*/
DROP TEMPORARY TABLE IF EXISTS d2;
CREATE TEMPORARY TABLE d2
SELECT 
    ID, LEFT(v, LENGTH(v)-1) AS b
FROM 
    d1
;
ALTER TABLE d2 ADD INDEX b(b);

/* Search for duplicates */
DROP TEMPORARY TABLE IF EXISTS duplicates;
CREATE TEMPORARY TABLE duplicates
SELECT 
    b, COUNT(b) AS b_count 
FROM 
    d2
GROUP BY b HAVING COUNT(b)>1
;
ALTER TABLE duplicates ADD INDEX b(b);

/* Display for duplicates */
SELECT 
    d2.ID, d2.b 
FROM 
    d2 
    INNER JOIN duplicates ON d2.b=duplicates.b
;

This should give you :

1   "default"
2   "default"
4   "default2"
5   "default2"