I am looking to create a bash script that can grab files fitting a certain glob pattern and cp them to another folder for example
$foo\
a.txt
b.txt
c.txt
e.txt
f.txt
g.txt
run script that request 2 files I would get
$bar\
c.txt
f.txt
I am not sure if bash has a random number generator and how to use that to pull from a list. The directory is large as well (over 100K) so some of the glob stuff won't work.
Thanks in advance
CodePudding user response:
Try this:
#!/bin/bash
sourcedir="files"
# Arguments processing
if [[ $# -ne 1 ]]
then
echo "Usage: random_files.bash NUMBER-OF-FILES"
echo " NUMBER-OF-FILES: how many random files to select"
exit 0
else
numberoffiles="$1"
fi
# Validations
listoffiles=()
while IFS='' read -r line; do listoffiles =("$line"); done < <(find "$sourcedir" -type f -print)
totalnumberoffiles=${#listoffiles[@]}
# loop on the number of files the user wanted
for (( i=1; i<=numberoffiles; i ))
do
# Select a random number between 0 and $totalnumberoffiles
randomnumber=$(( RANDOM % totalnumberoffiles ))
echo "${listoffiles[$randomnumber]}"
done
- build an array with the filenames
- random a number from 0 to the size of the array
- display the filename at that index
- I built in a loop if you want to randomly select more than one file
- you can setup another argument for the location of the files, I hard coded it here.
Another method, if this one fails because of to many files in the same directory, could be:
#!/bin/bash
sourcedir="files"
# Arguments processing
if [[ $# -ne 1 ]]
then
echo "Usage: random_files.bash NUMBER-OF-FILES"
echo " NUMBER-OF-FILES: how many random files to select"
exit 0
else
numberoffiles="$1"
fi
# Validations
find "$sourcedir" -type f -print >list.txt
totalnumberoffiles=$(wc -l list.txt | awk '{print $1}')
# loop on the number of files the user wanted
for (( i=1; i<=numberoffiles; i ))
do
# Select a random number between 1 and $totalnumberoffiles
randomnumber=$(( ( RANDOM % totalnumberoffiles ) 1 ))
sed -n "${randomnumber}p" list.txt
done
/bin/rm -f list.txt
- build a list of the files, so that each filename will be on one line
- select a random number
- in that one, the randomnumber must be 1 since line count starts at 1, not at 0 like in an array.
- use
sed
to print the random line from the list of files
CodePudding user response:
Using GNU shuf
, this copies N random files matching the given glob pattern in the given source directory to the given destination directory.
#!/bin/bash -e
shopt -s failglob
n=${1:?} glob=${2:?} source=${3:?} dest=${4:?}
declare -i rand
IFS=
[[ -d "$source" ]]
[[ -d "$dest" && -w "$dest" ]]
cd "$dest"
dest=$PWD
cd "$OLDPWD"
cd "$source"
printf '%s\0' $glob |
shuf -zn "$n" |
xargs -0 cp -t "$dest"
Use like:
./cp-rand 2 '?.txt' /source/dir /dest/dir
This will work for a directory containing thousands of files.
xargs
will manage limits likeARG_MAX
.$glob
, unquoted, undergoes filename expansion (glob expansion). BecauseIFS
is empty, the glob pattern can contain whitespace.Matching sub-directories will cause
cp
to error and a premature exit (some files may have already been copied).cp -r
to allow sub-directories.cp -t target
andxargs -0
are not POSIX.Note that using a random number to select files from a list can cause cause duplicates, so you might copy less than N files. Hence using GNU
shuf
.