I have a directory containing over 30,000 files with the following naming convention:
apple_CQ2rtQFD15H_1.txt
apple_CQVku8Qjdzx_1.txt
apple_CQ2rtQFD15H_3.txt
bananna___Bo3-mKXnozt___.txt
bananna___CPf6gN5L3SP___.txt
bananna___CTu8APZMomD___.txt
bananna_rotten___Byj7BPXNnpE___.txt
I want to create a script that will create a new directory based on the first part of the filename before the first uppercase character and move all corresponding files into that directory.
For example all files starting with apple_ will be inserted into a newly created directory called apple_.
Has anyone got a solution to this? I was thinking of modifying this Tcl script:
cd "C:/Development/test"
# glob is a tcl command to list all functions that match the requirements
set files [glob TTT*_*]
foreach f $files {
# use the underscore as a separator to split f and store the parts in dir and fnew
lassign [split $f "_"] dir fnew
if {![file exist $dir]} {
file mkdir $dir
}
file rename $f [file join $dir $fnew]
}
And using the regular expression [A-Z]\S
in place of the [split $f "_"]
but I don't really know how to implement it.
CodePudding user response:
Would you please try a Linux bash
solution:
#!/bin/bash
cd "C:/Development/test"
for f in *.txt; do # loop over the *.txt files
if [[ $f =~ ([^A-Z] )(. ) ]]; then # split on the 1st captial letter
dir=$(sed "s/_\ $//" <<< "${BASH_REMATCH[1]}")
# remove trailing underscores of the 1st capture group
fnew="${BASH_REMATCH[2]}" # 2nd capture group
mkdir -p -- "$dir" # create a directory "$dir" if nonexistent
mv -- "$f" "$dir/$fnew" # rename the file to "$dir/$fnew"
fi
done
Result with the provided files:
C:/Development/test/
-- apple
| -- CQ2rtQFD15H_1.txt
| -- CQ2rtQFD15H_3.txt
| -- CQVku8Qjdzx_1.txt
-- bananna
| -- Bo3-mKXnozt___.txt
| -- CPf6gN5L3SP___.txt
| -- CTu8APZMomD___.txt
-- bananna_rotten
-- Byj7BPXNnpE___.txt
CodePudding user response:
Before:
$ tree .
.
├── apple_CQ2rtQFD15H_1.txt
├── apple_CQ2rtQFD15H_3.txt
├── apple_CQVku8Qjdzx_1.txt
├── bananna___Bo3-mKXnozt___.txt
├── bananna___CPf6gN5L3SP___.txt
├── bananna___CTu8APZMomD___.txt
└── bananna_rotten___Byj7BPXNnpE___.txt
With Tcl:
foreach file [glob -- *] {
if {[regexp {^([^_] )} $file -> prefix]} {
file mkdir $prefix
file rename $file $prefix
}
}
After:
$ tree .
.
├── apple
│ ├── apple_CQ2rtQFD15H_1.txt
│ ├── apple_CQ2rtQFD15H_3.txt
│ └── apple_CQVku8Qjdzx_1.txt
└── bananna
├── bananna___Bo3-mKXnozt___.txt
├── bananna___CPf6gN5L3SP___.txt
├── bananna___CTu8APZMomD___.txt
└── bananna_rotten___Byj7BPXNnpE___.txt
CodePudding user response:
before the first uppercase character
With pure bash
that supports =~
test operator.
#!/usr/bin/env bash
cd "C:/Development/test" || exit
##: Just in case there are no files ending in *.txt
##: the glob will not expand with nullglob on.
shopt -s nullglob dotglob
files=(*.txt)
##: See https://mywiki.wooledge.org/BashFAQ/004
(( ${#files[*]} )) || {
printf 'directory does not contain *.txt files!\n' >&2
exit 1
}
shopt -u nullglob dotglob
for f in "${files[@]}"; do
##: Split from the first Upper case letter.
[[ $f =~ ^([^[:upper:]] )(.*)$ ]] &&
mkdir -vp -- "${BASH_REMATCH[1]}" || exit
mv -v -- "$f" "${BASH_REMATCH[1]}" || exit
done
Output
mkdir: created directory 'apple_'
renamed 'apple_CQ2rtQFD15H_1.txt' -> 'apple_/apple_CQ2rtQFD15H_1.txt'
renamed 'apple_CQ2rtQFD15H_3.txt' -> 'apple_/apple_CQ2rtQFD15H_3.txt'
renamed 'apple_CQVku8Qjdzx_1.txt' -> 'apple_/apple_CQVku8Qjdzx_1.txt'
mkdir: created directory 'bananna___'
renamed 'bananna___Bo3-mKXnozt___.txt' -> 'bananna___/bananna___Bo3-mKXnozt___.txt'
renamed 'bananna___CPf6gN5L3SP___.txt' -> 'bananna___/bananna___CPf6gN5L3SP___.txt'
renamed 'bananna___CTu8APZMomD___.txt' -> 'bananna___/bananna___CTu8APZMomD___.txt'
mkdir: created directory 'bananna_rotten___'
renamed 'bananna_rotten___Byj7BPXNnpE___.txt' -> 'bananna_rotten___/bananna_rotten___Byj7BPXNnpE___.txt'
CodePudding user response:
@ECHO OFF
SETLOCAL
rem The following setting for the source directory is a name
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.
SET "sourcedir=u:\your files"
SET "tempfile=%temp%\####.$$$"
(
FOR /f "delims=" %%e IN (
'dir /b /a-d "%sourcedir%\*" '
) DO ( FOR /f "delims=ABCD" %%y IN ("%%e") DO ECHO %%y:%%e
)
)>"%tempfile%"
FOR /f "tokens=1*delims=:" %%b IN ('sort /r "%tempfile%" ') do echo MD "%%b" 2>nul&echo MOVE "%%c" "%%b"
del "%tempfile%"
GOTO :EOF
Always verify against a test directory before applying to real data.
The for...%%e
assigns each filename found to %%e
; and then the for...%%y
assigns the part before the delimiters to %%y
(obviously, use all of the letters, A-Z, in upper-case)
The report in the tempfile is thus (eg)
apple_:apple_CQ2rtQFD15H_1.txt
So, using delimiter :
(which can't appear in a filename), sort in reverse order hence the longest prefix appears first; make the directory from the part before the :
in %%b
(2>nul
suppresses complaints about the directory already exists) and then move
the filename in %%c
to that directory.
I've just echo
ed the commands for testing. You'd need to remove the echo
keyword before the md
and move
to activate.
If necessary, you could prefix %sourcedir%\
before %%b
or %%c
as required.
The move
command can be silenced by appending >nul
---- on second thoughts ----
Since each individual filename is available,
@ECHO OFF
SETLOCAL
rem The following setting for the source directory is a name
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.
SET "sourcedir=u:\your files"
FOR /f "delims=" %%e IN (
'dir /b /a-d "%sourcedir%\*" '
) DO ( FOR /f "delims=ABCD" %%y IN ("%%e") DO ECHO MD %%y 2>nul&ECHO MOVE "%%e" "%%y"
)
GOTO :EOF
CodePudding user response:
Here is one way to do it in PowerShell using Group-Object
. To explain the use of -Force
with New-Item
, is so that the code is re-usable:
Example 8: Use the
This example creates a folder with a file inside. Then, attempts to create the same folder using-Force
parameter to attempt to recreate folders-Force
. It will not overwrite the folder but simply return the existing folder object with the file created intact.
$initialPath = 'path/to/txtFiles'
$destination = 'path/where/to/createNewFolders'
# Filter and all files in `$initialPath` that contain
# at least 1 `_` and it's extension is `.txt`
Get-ChildItem -LiteralPath $initialPath -Filter *_*.txt | Group-Object {
# Group all files by everything up to the first `_` followed by an Uppercase Letter
[regex]::Match($_.BaseName, '(. ?_)[A-Z]').Groups[1].Value
} | ForEach-Object {
# Create a new path for this group of files
$path = Join-Path $destination -ChildPath $_.Name
# Create a new folder using above path
$dir = New-Item $path -ItemType Directory -Force
# Move all files in this group to the new folder
$_.Group | Move-Item -Destination $dir.FullName
}
For a quick demo to see how the cmdlet works you can use this:
@'
apple_C_Q2rtQFD15H_1.txt
apple_CQVku8Qjdzx_1.txt
apple_CQ2rtQFD15H_3.txt
bananna___Bo3-mKXnozt___.txt
bananna___CPf6gN5L3SP___.txt
bananna___CTu8APZMomD___.txt
bananna_rotten___Byj7BPXNnpE___.txt
'@ -split '\r?\n' -as [System.IO.FileInfo[]] | Group-Object {
[regex]::Match($_.BaseName, '(. ?_)[A-Z]').Groups[1].Value
}
And here is a demo for the regex: https://regex101.com/r/qK3Ppp/1
Since it's not clear if you want to exclude or include the underscores for the directory creation, here is another regex if you want to exclude the underscores:
@'
apple_C_Q2rtQFD15H_1.txt
apple_CQVku8Qjdzx_1.txt
apple_CQ2rtQFD15H_3.txt
bananna___Bo3-mKXnozt___.txt
bananna___CPf6gN5L3SP___.txt
bananna___CTu8APZMomD___.txt
bananna_rotten___Byj7BPXNnpE___.txt
'@ -split '\r?\n' -as [System.IO.FileInfo[]] | Group-Object {
[regex]::Match($_.BaseName, '(. ?)_{1,}[A-Z]').Groups[1].Value
}