Home > Net >  How do I create a folder named from part of filename before first uppercase character and move corre
How do I create a folder named from part of filename before first uppercase character and move corre

Time:09-05

I have a directory containing over 30,000 files with the following naming convention:

apple_CQ2rtQFD15H_1.txt
apple_CQVku8Qjdzx_1.txt
apple_CQ2rtQFD15H_3.txt
bananna___Bo3-mKXnozt___.txt
bananna___CPf6gN5L3SP___.txt
bananna___CTu8APZMomD___.txt
bananna_rotten___Byj7BPXNnpE___.txt

I want to create a script that will create a new directory based on the first part of the filename before the first uppercase character and move all corresponding files into that directory.

For example all files starting with apple_ will be inserted into a newly created directory called apple_.

Has anyone got a solution to this? I was thinking of modifying this Tcl script:

cd "C:/Development/test"
# glob is a tcl command to list all functions that match the requirements
set files [glob TTT*_*]
foreach f $files {
  # use the underscore as a separator to split f and store the parts in dir and fnew
  lassign [split $f "_"] dir fnew
  if {![file exist $dir]} {
    file mkdir $dir
  }
  file rename $f [file join $dir $fnew]
}

And using the regular expression [A-Z]\S in place of the [split $f "_"] but I don't really know how to implement it.

CodePudding user response:

Would you please try a Linux bash solution:

#!/bin/bash

cd "C:/Development/test"

for f in *.txt; do                      # loop over the *.txt files
    if [[ $f =~ ([^A-Z] )(. ) ]]; then  # split on the 1st captial letter
        dir=$(sed "s/_\ $//" <<< "${BASH_REMATCH[1]}")
                                        # remove trailing underscores of the 1st capture group
        fnew="${BASH_REMATCH[2]}"       # 2nd capture group
        mkdir -p -- "$dir"              # create a directory "$dir" if nonexistent
        mv -- "$f" "$dir/$fnew"         # rename the file to "$dir/$fnew"
     fi
done

Result with the provided files:

C:/Development/test/
 -- apple
|    -- CQ2rtQFD15H_1.txt
|    -- CQ2rtQFD15H_3.txt
|    -- CQVku8Qjdzx_1.txt
 -- bananna
|    -- Bo3-mKXnozt___.txt
|    -- CPf6gN5L3SP___.txt
|    -- CTu8APZMomD___.txt
 -- bananna_rotten
     -- Byj7BPXNnpE___.txt

CodePudding user response:

Before:

$ tree .
.
├── apple_CQ2rtQFD15H_1.txt
├── apple_CQ2rtQFD15H_3.txt
├── apple_CQVku8Qjdzx_1.txt
├── bananna___Bo3-mKXnozt___.txt
├── bananna___CPf6gN5L3SP___.txt
├── bananna___CTu8APZMomD___.txt
└── bananna_rotten___Byj7BPXNnpE___.txt

With Tcl:

foreach file [glob -- *] {
    if {[regexp {^([^_] )} $file -> prefix]} {
        file mkdir $prefix
        file rename $file $prefix
    }
}

After:

$ tree .
.
├── apple
│   ├── apple_CQ2rtQFD15H_1.txt
│   ├── apple_CQ2rtQFD15H_3.txt
│   └── apple_CQVku8Qjdzx_1.txt
└── bananna
    ├── bananna___Bo3-mKXnozt___.txt
    ├── bananna___CPf6gN5L3SP___.txt
    ├── bananna___CTu8APZMomD___.txt
    └── bananna_rotten___Byj7BPXNnpE___.txt

CodePudding user response:

before the first uppercase character

With pure bash that supports =~ test operator.

#!/usr/bin/env bash

cd "C:/Development/test" || exit

##: Just in case there are no files ending in *.txt 
##: the glob will not expand with nullglob on.
shopt -s nullglob dotglob

files=(*.txt)
##: See https://mywiki.wooledge.org/BashFAQ/004
(( ${#files[*]} )) || {
  printf 'directory does not contain *.txt files!\n' >&2
  exit 1
}

shopt -u nullglob dotglob

for f in "${files[@]}"; do
  ##: Split from the first Upper case letter.
  [[ $f =~ ^([^[:upper:]] )(.*)$ ]] && 
  mkdir -vp -- "${BASH_REMATCH[1]}" || exit
  mv -v -- "$f" "${BASH_REMATCH[1]}" || exit
done

Output

mkdir: created directory 'apple_'
renamed 'apple_CQ2rtQFD15H_1.txt' -> 'apple_/apple_CQ2rtQFD15H_1.txt'
renamed 'apple_CQ2rtQFD15H_3.txt' -> 'apple_/apple_CQ2rtQFD15H_3.txt'
renamed 'apple_CQVku8Qjdzx_1.txt' -> 'apple_/apple_CQVku8Qjdzx_1.txt'
mkdir: created directory 'bananna___'
renamed 'bananna___Bo3-mKXnozt___.txt' -> 'bananna___/bananna___Bo3-mKXnozt___.txt'
renamed 'bananna___CPf6gN5L3SP___.txt' -> 'bananna___/bananna___CPf6gN5L3SP___.txt'
renamed 'bananna___CTu8APZMomD___.txt' -> 'bananna___/bananna___CTu8APZMomD___.txt'
mkdir: created directory 'bananna_rotten___'
renamed 'bananna_rotten___Byj7BPXNnpE___.txt' -> 'bananna_rotten___/bananna_rotten___Byj7BPXNnpE___.txt'

CodePudding user response:

@ECHO OFF
SETLOCAL
rem The following setting for the source directory is a name
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"
SET "tempfile=%temp%\####.$$$"
(
FOR /f "delims=" %%e IN (
 'dir /b /a-d "%sourcedir%\*" '
 ) DO ( FOR /f "delims=ABCD" %%y IN ("%%e") DO ECHO %%y:%%e
)
)>"%tempfile%"

FOR /f "tokens=1*delims=:" %%b IN ('sort /r "%tempfile%" ') do echo MD "%%b" 2>nul&echo MOVE "%%c" "%%b"

del "%tempfile%"

GOTO :EOF

Always verify against a test directory before applying to real data.

The for...%%e assigns each filename found to %%e; and then the for...%%y assigns the part before the delimiters to %%y (obviously, use all of the letters, A-Z, in upper-case)

The report in the tempfile is thus (eg)

apple_:apple_CQ2rtQFD15H_1.txt

So, using delimiter : (which can't appear in a filename), sort in reverse order hence the longest prefix appears first; make the directory from the part before the : in %%b (2>nul suppresses complaints about the directory already exists) and then move the filename in %%c to that directory.

I've just echoed the commands for testing. You'd need to remove the echo keyword before the md and move to activate.

If necessary, you could prefix %sourcedir%\ before %%b or %%c as required.

The move command can be silenced by appending >nul

---- on second thoughts ----

Since each individual filename is available,

@ECHO OFF
SETLOCAL
rem The following setting for the source directory is a name
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"

FOR /f "delims=" %%e IN (
 'dir /b /a-d "%sourcedir%\*" '
 ) DO ( FOR /f "delims=ABCD" %%y IN ("%%e") DO ECHO MD %%y 2>nul&ECHO MOVE "%%e" "%%y"
)

GOTO :EOF

CodePudding user response:

Here is one way to do it in PowerShell using Group-Object. To explain the use of -Force with New-Item, is so that the code is re-usable:

Example 8: Use the -Force parameter to attempt to recreate folders

This example creates a folder with a file inside. Then, attempts to create the same folder using -Force. It will not overwrite the folder but simply return the existing folder object with the file created intact.

$initialPath = 'path/to/txtFiles'
$destination = 'path/where/to/createNewFolders'
# Filter and all files in `$initialPath` that contain
# at least 1 `_` and it's extension is `.txt`
Get-ChildItem -LiteralPath $initialPath -Filter *_*.txt | Group-Object {
    # Group all files by everything up to the first `_` followed by an Uppercase Letter
    [regex]::Match($_.BaseName, '(. ?_)[A-Z]').Groups[1].Value
} | ForEach-Object {
    # Create a new path for this group of files
    $path = Join-Path $destination -ChildPath $_.Name
    # Create a new folder using above path
    $dir  = New-Item $path -ItemType Directory -Force
    # Move all files in this group to the new folder
    $_.Group | Move-Item -Destination $dir.FullName
}

For a quick demo to see how the cmdlet works you can use this:

@'
apple_C_Q2rtQFD15H_1.txt
apple_CQVku8Qjdzx_1.txt
apple_CQ2rtQFD15H_3.txt
bananna___Bo3-mKXnozt___.txt
bananna___CPf6gN5L3SP___.txt
bananna___CTu8APZMomD___.txt
bananna_rotten___Byj7BPXNnpE___.txt
'@ -split '\r?\n' -as [System.IO.FileInfo[]] | Group-Object {
    [regex]::Match($_.BaseName, '(. ?_)[A-Z]').Groups[1].Value
}

And here is a demo for the regex: https://regex101.com/r/qK3Ppp/1

Since it's not clear if you want to exclude or include the underscores for the directory creation, here is another regex if you want to exclude the underscores:

@'
apple_C_Q2rtQFD15H_1.txt
apple_CQVku8Qjdzx_1.txt
apple_CQ2rtQFD15H_3.txt
bananna___Bo3-mKXnozt___.txt
bananna___CPf6gN5L3SP___.txt
bananna___CTu8APZMomD___.txt
bananna_rotten___Byj7BPXNnpE___.txt
'@ -split '\r?\n' -as [System.IO.FileInfo[]] | Group-Object {
    [regex]::Match($_.BaseName, '(. ?)_{1,}[A-Z]').Groups[1].Value
}
  • Related