Removing SQL Rows from Query if two rows have an identical ID but differences in the columns-CodePudding

I´m currently working stuck on a SQL issue (well, mainly because I can´t find a way to google it and my SQL skills do not suffice to solve it myself)

I´m working on a system where documents are edited. If the editing process is finished, users mark the document as solved. In the MSSQL database, the corresponding row is not updated but instead, a new row is generated. Thus, every document that has been processed has [e.g.: should have] multiple rows in the DB.

See the following situation:

ID	ID2	Solved	Steps	Process	Solved
1	1	yes	Three	ATAT	AF
2	2	yes	One	ATAT	FR
2	3	yes	One	ATAT	EG
2	4	yes	One	ATAT	AF
3	5	no	One	ABAT	AF
4	6	yes	One	ATAT	FR
5	7	no	One	AVAT	EG
6	8	yes	Two	SATT	FR
6	9	yes	Two	SATT	EG
6	10	yes	Two	SATT	AF

Now: I need to find out which documents have not been processed yet. In order to achieve this, I have to remove the rows with the IDs 2 | 4 (because the system sadly isn´t too reliable I guess) | and 6.

What I could do is filter the duplicates afterwards, in python/js/whatever. But I am curious whether I can remove these rows directly in SQL as well.

CodePudding user response：

I'm not sure I understand your question correct:

...every document that has been processed has [...] multiple rows in the DB

I need to find out which documents have not been processed yet

So it seems you need to find unique documents with no versions, this could be done using a GROUP BY with a HAVING clause:

SELECT
Id
FROM dbo.TableName
GROUP BY Id
HAVING COUNT(*) = 1

CodePudding user response：

If you need to delete all rows where every id doesn't have any "Solved = 'no'", you can use a DELETE statement that will exclude all "id" values that have at least one "Solved = 'no'" in the corresponding rows.

DELETE FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'no');

Check the demo here.