I'm looking for someone to help me with a very specific task I have.
I'm analysing data from computer hard drives and need to be able to list folders which are duplicated after being extracted from .zip files. Here is an example of the data I am working with:
ItemName | Extension | ItemType |
---|---|---|
MyZipFolder.zip | .zip | File |
MyZipFolder | null | Folder |
PersonalDocuments.zip | .zip | File |
PersonalDocuments | null | Folder |
As you can see the extension '.zip' is included in the 'ItemName' and 'Extension' column. When extracted from a .zip file, it becomes a folder. I need a way of listing either the .zip file or the folder which it becomes after extraction (either will do, it just needs to be listed with the knowledge that it is a duplicate).
The caveat to this is that my data consists of plenty other folders and files with different extensions e.g. '.docx', '.msg' so the query needs to discount these.
I hope this makes sense - thanks!
Expected output might look something like this:
ItemName | Extension | ItemType |
---|---|---|
MyZipFolder | null | Folder |
PersonalDocuments | null | Folder |
So a list of all the folders which I know have a .zip equivalent in the data.
CodePudding user response:
Not sure yet, but do you mean something like this?
select *
from your_table y
where ItemType = 'Folder'
and exists (
select 1 from your_table yy
where yy.Extension = '.zip'
and replace(yy.ItemName,'.zip','') = y.ItemName
)
CodePudding user response:
I think I got what you need :
select ItemName
from tablename
group by replace(ItemName, isnull(Extension,''))
having case count(case when Extension = '.zip' then 1 end) > 1