Home > Enterprise >  Grab random files from a directory using just bash
Grab random files from a directory using just bash

Time:02-22

I am looking to create a bash script that can grab files fitting a certain glob pattern and cp them to another folder for example

$foo\ 
a.txt
b.txt
c.txt
e.txt
f.txt
g.txt 

run script that request 2 files I would get

$bar\ 
c.txt 
f.txt 

I am not sure if bash has a random number generator and how to use that to pull from a list. The directory is large as well (over 100K) so some of the glob stuff won't work.

Thanks in advance

CodePudding user response:

Try this:

#!/bin/bash

sourcedir="files"

# Arguments processing
if [[ $# -ne 1 ]]
then
    echo "Usage: random_files.bash NUMBER-OF-FILES"
    echo "       NUMBER-OF-FILES: how many random files to select"
    exit 0
else
    numberoffiles="$1"
fi

# Validations
listoffiles=()
while IFS='' read -r line; do listoffiles =("$line"); done < <(find "$sourcedir" -type f -print)
totalnumberoffiles=${#listoffiles[@]}

# loop on the number of files the user wanted
for (( i=1; i<=numberoffiles; i   ))
do
    # Select a random number between 0 and $totalnumberoffiles
    randomnumber=$(( RANDOM % totalnumberoffiles ))
    echo "${listoffiles[$randomnumber]}"
done
  • build an array with the filenames
  • random a number from 0 to the size of the array
  • display the filename at that index
  • I built in a loop if you want to randomly select more than one file
  • you can setup another argument for the location of the files, I hard coded it here.

Another method, if this one fails because of to many files in the same directory, could be:

#!/bin/bash

sourcedir="files"

# Arguments processing
if [[ $# -ne 1 ]]
then
    echo "Usage: random_files.bash NUMBER-OF-FILES"
    echo "       NUMBER-OF-FILES: how many random files to select"
    exit 0
else
    numberoffiles="$1"
fi

# Validations
find "$sourcedir" -type f -print >list.txt
totalnumberoffiles=$(wc -l list.txt | awk '{print $1}')

# loop on the number of files the user wanted
for (( i=1; i<=numberoffiles; i   ))
do
    # Select a random number between 1 and $totalnumberoffiles
    randomnumber=$(( ( RANDOM % totalnumberoffiles )   1 ))
    sed -n "${randomnumber}p" list.txt
done

/bin/rm -f list.txt
  • build a list of the files, so that each filename will be on one line
  • select a random number
  • in that one, the randomnumber must be 1 since line count starts at 1, not at 0 like in an array.
  • use sed to print the random line from the list of files

CodePudding user response:

Using GNU shuf, this copies N random files matching the given glob pattern in the given source directory to the given destination directory.

#!/bin/bash -e

shopt -s failglob

n=${1:?} glob=${2:?} source=${3:?} dest=${4:?}
declare -i rand
IFS=

[[ -d "$source" ]]
[[ -d "$dest" && -w "$dest" ]]

cd "$dest"
dest=$PWD
cd "$OLDPWD"
cd "$source"

printf '%s\0' $glob |
shuf -zn "$n" |
xargs -0 cp -t "$dest"

Use like:

./cp-rand 2 '?.txt' /source/dir /dest/dir
  • This will work for a directory containing thousands of files. xargs will manage limits like ARG_MAX.

  • $glob, unquoted, undergoes filename expansion (glob expansion). Because IFS is empty, the glob pattern can contain whitespace.

  • Matching sub-directories will cause cp to error and a premature exit (some files may have already been copied). cp -r to allow sub-directories.

  • cp -t target and xargs -0 are not POSIX.

  • Note that using a random number to select files from a list can cause cause duplicates, so you might copy less than N files. Hence using GNU shuf.

  • Related