Home > Enterprise >  How to list duplicates based on different criteria's T-SQL
How to list duplicates based on different criteria's T-SQL

Time:12-01

I'm looking for someone to help me with a very specific task I have.

I'm analysing data from computer hard drives and need to be able to list folders which are duplicated after being extracted from .zip files. Here is an example of the data I am working with:

ItemName Extension ItemType
MyZipFolder.zip .zip File
MyZipFolder null Folder
PersonalDocuments.zip .zip File
PersonalDocuments null Folder

As you can see the extension '.zip' is included in the 'ItemName' and 'Extension' column. When extracted from a .zip file, it becomes a folder. I need a way of listing either the .zip file or the folder which it becomes after extraction (either will do, it just needs to be listed with the knowledge that it is a duplicate).

The caveat to this is that my data consists of plenty other folders and files with different extensions e.g. '.docx', '.msg' so the query needs to discount these.

I hope this makes sense - thanks!

Expected output might look something like this:

ItemName Extension ItemType
MyZipFolder null Folder
PersonalDocuments null Folder

So a list of all the folders which I know have a .zip equivalent in the data.

CodePudding user response:

Not sure yet, but do you mean something like this?

select *
from your_table y
where ItemType = 'Folder'
  and exists (
    select 1 from your_table yy
    where yy.Extension = '.zip'
      and replace(yy.ItemName,'.zip','') = y.ItemName
  )

CodePudding user response:

I think I got what you need :

select ItemName 
from tablename
group by replace(ItemName, isnull(Extension,''))
having case count(case when Extension = '.zip' then 1 end) > 1
  • Related