I am trying to create directories and subdirectories based on the names of existing files. After that I want to move those files into the according directories. I have already come pretty far, also with the help of here and here, but I am failing at some point.
Existing Test Files Actually about 5000 files | Folder structure This is how it should look like afterwards |
---|---|
MM0245AK625_G03_701.txt | MM\MM0245\625\G03\MM0245AK625_G03_701.txt |
MM0245AK830_G04_701.txt | MM\MM0245\830\G04\MM0245AK830_G04_701.txt |
VY0245AK_G03.txt | VY\VY0245\VY0245AK_G03.txt |
VY0245AK_G03_701.txt | VY\VY0245\G03\VY0245AK_G03_701.txt |
VY0245AK625_G03.txt | VY\VY0245\625\VY0245AK625_G03.txt |
VY0245AK625_G03_701.txt | VY\VY0245\625\G03\VY0245AK625_G03_701.txt |
VY0345AK625_G03_701.txt | VY\VY0345\625\G03\VY0345AK625_G03_701.txt |
Code for creating those files is at the end of this post.
As you can see, the files do match some kind of pattern, but not consistently. I use multiple copies of my code with different 'parameters' to sort each type of filepattern, but there gotta be a more streamline way.
Existing code
$dataPath = "$PSScriptRoot\Test"
#$newDataPath = "$PSScriptRoot\"
Get-ChildItem $dataPath -Filter *.txt | % {
$g1 = $_.BaseName.Substring(0, 2)
$g2 = $_.BaseName.Substring(0, 6)
$g3 = $_.BaseName.Substring(8, 3)
$g4 = $_.BaseName.Substring(12, 3)
$path = "$DataPath\$g1\$g2\$g3\$g4"
if (-not (Test-Path $path)) {
New-Item -ItemType Directory -Path $path
}
Move-Item -Path $_.FullName -Destination $path
}
This code also creates directories in the 3rd $g3
layer for files in "the shorter format", e.g. XX0000AK_G00.txt. This file should however not be moved further than layer $g2
. Of course the code above is not capable of doing this, so I tried it with regex below.
This is an alternative idea (not worked out furhter than creating directories), but I failed to continue after
Select-Object -Unique
. I am failing to use $Matches[1]
in New-Item
, because I can only Select-Object -unique
the variable $_
, not $Matches[1]
or even the subdirectory "$($Matches[1])$($Matches[2])"
. The following code is my attempt.
cd $PSScriptRoot\Test
# Create Folder Layer 1
Get-ChildItem |
% {
$_.BaseName -match "^(\w{2})(\d{4})AN(\d{3})?_(G\d{2})(_\d{3})?$" | Out-Null
$Matches[1]
"$($Matches[1])$($Matches[2])"
} |
Select-Object -Unique |
% {
New-Item -ItemType directory $_
} | Out-Null
I am fairly new to powershell, please don't be too harsh :) I also don't have a programming background, so please excuse the use of incorrect wording.
new-item $dataPath\MM0245AK830_G04_701.txt -ItemType File
new-item $dataPath\VY0245AK_G03.txt -ItemType File
new-item $dataPath\VY0245AK_G03_701.txt -ItemType File
new-item $dataPath\VY0245AK625_G03.txt -ItemType File
new-item $dataPath\VY0245AK625_G03_701.txt -ItemType File
new-item $dataPath\VY0345AK625_G03_701.txt -ItemType File
CodePudding user response:
i am truly bad at complex regex patterns [blush], so this is done with simple string ops, mostly.
what the code does ...
- fakes reading in some files
when you have tested this and it works as needed on all your test files, replace the entire#region/#endregion
block with aGet-ChildItem
call. - iterates thru the collection
- splits the BaseName on
ak
& saves it for later use - checks for a the two short file layouts
- checks for 1
_
versus 2 - builds the
$Dir
string for each of those 2 filename layouts - builds the long file name
$Dir
- uses the previous
$Dir
stuff to build the$FullDest
for each file - shows the various stages for each file
that last section would be replaced with your mkdir
& Move-Item
commands.
the code ...
#region >>> fake reading in files
# when ready to use the real things, use $Get-ChildItem
$InStuff = @'
MM0245AK625_G03_701.txt
MM0245AK830_G04_701.txt
VY0245AK_G03.txt
VY0245AK_G03_701.txt
VY0245AK625_G03.txt
VY0245AK625_G03_701.txt
VY0345AK625_G03_701.txt
'@ -split [System.Environment]::NewLine |
ForEach-Object {
[System.IO.FileInfo]$_
}
#endregion >>> fake reading in files
foreach ($IS_Item in $InStuff)
{
$BNSplit_1 = $IS_Item.BaseName -split 'ak'
if ($BNSplit_1[-1].StartsWith('_'))
{
if (($BNSplit_1[-1] -replace '[^_]').Length -eq 1)
{
$Dir = '{0}\{1}' -f $IS_Item.BaseName.Substring(0, 2),
$IS_Item.BaseName.Substring(0, 6)
}
else
{
$Dir = '{0}\{1}\{2}' -f $IS_Item.BaseName.Substring(0, 2),
$IS_Item.BaseName.Substring(0, 6),
$IS_Item.BaseName.Split('_')[1]
}
}
else
{
$Dir = '{0}\{1}\{2}\{3}' -f $IS_Item.BaseName.Substring(0, 2),
$IS_Item.BaseName.Substring(0, 6),
$BNSplit_1[-1].Split('_')[0],
$BNSplit_1[-1].Split('_')[1]
}
$FullDest = Join-Path -Path $Dir -ChildPath $IS_Item
#region >>> show what was done with each file
# replace this block with your MkDir & Move-Item commands
$IS_Item.Name
$Dir
$FullDest
'depth = {0}' -f ($FullDest.Split('\').Count - 1)
'=' * 20
#endregion >>> show what was done with each file
}
the output ...
MM0245AK625_G03_701.txt
MM\MM0245\625\G03
MM\MM0245\625\G03\MM0245AK625_G03_701.txt
depth = 4
====================
MM0245AK830_G04_701.txt
MM\MM0245\830\G04
MM\MM0245\830\G04\MM0245AK830_G04_701.txt
depth = 4
====================
VY0245AK_G03.txt
VY\VY0245
VY\VY0245\VY0245AK_G03.txt
depth = 2
====================
VY0245AK_G03_701.txt
VY\VY0245\G03
VY\VY0245\G03\VY0245AK_G03_701.txt
depth = 3
====================
VY0245AK625_G03.txt
VY\VY0245\625\G03
VY\VY0245\625\G03\VY0245AK625_G03.txt
depth = 4
====================
VY0245AK625_G03_701.txt
VY\VY0245\625\G03
VY\VY0245\625\G03\VY0245AK625_G03_701.txt
depth = 4
====================
VY0345AK625_G03_701.txt
VY\VY0345\625\G03
VY\VY0345\625\G03\VY0345AK625_G03_701.txt
depth = 4
====================
CodePudding user response:
I would first split each file BaseName on the underscore. Then use a regex to split the first part into several array elements, combine that with a possible second part of the split in order to create the destination folder path for the files.
$DataPath = "$PSScriptRoot\Test"
$files = Get-ChildItem -Path $DataPath -Filter '*_*.txt' -File
foreach ($file in $files) {
$parts = $file.BaseName -split '_'
# regex your way to split the first part into path elements (remove empty items)
$folders = [regex]::Match($parts[0], '(?i)^(.{2})(\d{4})[A-Z]{2}(\d{3})?').Groups[1..3].Value | Where-Object { $_ -match '\S'}
# the second part is a merge with the first part
$folders[1] = $folders[0] $folders[1]
# if there was a third part after the split on the underscore, add $part[1] (i.e. 'Gxx') to the folders array
if ($parts.Count -gt 2) { $folders = $parts[1] }
# join the array elements with a backslash (i.e. [System.IO.Path]::DirectorySeparatorChar)
# and join all tat to the $DataPath to create the full destination for the file
$target = Join-Path -Path $DataPath -ChildPath ($folders -join '\')
# create the folder if that does not yet exist
$null = New-Item -Path $target -ItemType Directory -Force
# move the file to that (new) directory
$file | Move-Item -Destination $target -WhatIf
}
The -WhatIf
switch makes the code not move anything to the new destination, it will only display where the file would go to. Once you are happy with that information, remove -WhatIf
and run the code again