I am having a hard time with powershell (because I am learning it in the run). I have huuuge amount of data and I am trying to find a unique identifier for every folder with data. I wrote a script which is just MD5-ing every folder recursively and saving the hash value for every folder. But as you might have already thought it is super slow. So I thought that I will hash only the metadata. But I have no idea how to do this in powershell. The ideas from the internet are not working and they return always the same hash value. Has anyone had similar problem? Is there a magic powershell trick to perform such task?
Sorry for lack of precision.
I have a big ~20000 list of folders. In every folder there are unique data, photos, files etc. I iterated through every folder and counted hash of every file (I actually made a crypto-stream here so I had a one hash for the data). This solution is taking ages.
The solution I wanted to adopt was using the metadata. Like those from this command:
Get-ChildItem -Path $Env:USERPROFILE\Desktop -Force | Select-Object -First 1 | Format-List *
But hashing this always gives me the same value even when something changed. I have to have a possibility to chceck if nothing has changed in those files.
CodePudding user response:
Continuing from my comment.
3rdP tool: http://www.idrix.fr/Root/Samples/DirHash.zip
function Get-FolderHash ($folder)
{
dir $folder -Recurse | ?{!$_.psiscontainer} |
%{[Byte[]]$contents = [System.IO.File]::ReadAllBytes($_.fullname)}
$hasher = [System.Security.Cryptography.SHA1]::Create()
[string]::Join("",$($hasher.ComputeHash($contents) |
%{"{0:x2}" -f $_}))
}
Note, that I've not tested/validated either of the above and will leave that to you.
Lastly, this is not the first time this kind of question has been asked via SO, using the default cmdlet and some .Net. So, this could be seen/markerd as a duplicate.
$HashString = (Get-ChildItem C:\Temp -Recurse |
Get-FileHash -Algorithm MD5).Hash |
Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))
Original, faster but less robust, method:
$HashString = Get-ChildItem C:\script\test\TestFolders -Recurse | Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))
could be condensed into one line if wanted, although it starts getting harder to read:
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]"$(Get-ChildItem C:\script\test\TestFolders -Recurse|Out-String)"))
Whether it's faster or fast enough for your use case is a different matter. Yet, it does address ensuring you get a different hash based on target folder changes.
CodePudding user response:
First, create an MD5
class that does not create a new instance of System.Security.Cryptography.MD5 every time we create an MD5 from a string.
class MD5 {
static hidden [System.Security.Cryptography.MD5]$_md5 = [System.Security.Cryptography.MD5]::Create()
static [string]Create([string]$inputString) {
return [BitConverter]::ToString([MD5]::_md5.ComputeHash([Text.Encoding]::ASCII.GetBytes($inputString)))
}
}
Second, figure out a way to use each child items Name, Length, CreationTimeUtc, and LastWriteTimeUtc to create unique ID text per each child in the folder, merge into a single string and create an MD5 based on that resulting string.
- Get the child objects of a folder.
- Select only certain properties, returning the content as a string array.
- Join the string array into a single string. No need for joining with newline.
- Convert the string into an MD5.
- Output the newly created MD5.
$ChildItems = Get-ChildItem -Path $Env:USERPROFILE\Desktop -Force
$SelectProperties = [string[]]($ChildItems | Select-Object -Property Name, Length, CreationTimeUtc, LastWriteTimeUtc)
$JoinedText = $SelectProperties -join ''
$MD5 = [MD5]::Create($JoinedText)
$MD5
Alternately, join the above lines into a very long command.
$AltMD5 = [MD5]::Create([string[]](Get-ChildItem -Path $Env:USERPROFILE\Desktop -Force | Select-Object -Property Name, Length, CreationTimeUtc, LastWriteTimeUtc) -join '')
$AltMD5
This resulting MD5 should be a unique signature of a folder's contents, not the folder itself, but only of the contents. So, you could in theory change the name of the folder itself and this MD5 would remain the same.
Not exactly sure how you aim to use this, but be aware that if any file, or sub-folder, in the folder changes, the MD5 for the folder will also change.