Home > Enterprise >  More "random" alternative to shuf for selecting files in a directory
More "random" alternative to shuf for selecting files in a directory

Time:10-20

I put together the following Bash function (in my .bashrc) to open a "random" image from a given folder, one at a time until the user types N, after which it exits. The script works fine aside from the actual randomness of the images generated - in a quick test of 10 runs, only 4 images are unique.

Is this simply unavoidable due to the limited number of images in the directory (20), or is there an alternative to the shuf command that will yield more random results?

If it is unavoidable, what's the best way to adapt the function to avoid repeats (i.e. discard images that have already been selected)?

function generate_image() {
    while true; do
        command cd "D:\Users\Hashim\Pictures\Data" && 
        image="$(find . -type f -exec file --mime-type {} \  | awk -F: '{if ($2 ~/image\//) print $1}' | shuf -n1)" &&
        echo "Opening $image" &&
        cygstart "$image"
        read -p "Open another random image? [Y/n]"$'\n' -n 1 -r
        echo
        if [[ $REPLY =~ ^[Nn]$ ]] 
        then exit
        fi 
    done
}

CodePudding user response:

One way to handle this is by searching the filesystem and creating an array with a list of files in randomized order, and going through everything in that list before searching again.

Because you go through everything from one batch of shuf output before starting the next batch of shuf output, there's no longer a risk of repeats until everything has been seen.

# aside: I'm surprised you don't need to use cygpath to convert this
image_dir='D:/Users/Hashim/Pictures/Data'

refresh_image_list() {
  readarray -d '' image_list < <(
    find "$image_dir" -type f -exec file -0 --mime-type -- {}   \
    | while IFS= read -r -d '' filename && IFS= read -r desc; do
        [[ $desc = *image* ]] && printf '%s\0' "$filename"
      done \
    | shuf -z
  )
}

generate_image() {
  while true; do
    (( ${#image_list[@]} )) || refresh_image_list  # if list is empty, recreate
    set -- "${image_list[@]}"              # set argument list from image list
    while (( $# )); do                     # argument list isn't empty?
      echo "Opening $1"                    # ...try the first item on it
      cygstart "$1"
      shift                                # ...and then discard that item
      read -p $'Open another random image? [Y/n]\n' -n 1 -r
      echo
      if [[ $REPLY = [Nn] ]]; then         # user wants to quit?
        image_list=( "$@" )                # store unused images back to list
        return 0
      fi
    done
  done
}

We can simplify this if we're willing to just stop after the user has seen every image once, instead of generating a new batch, and don't need persistence across invocations:

generate_image() {
  while IFS= read -r -d '' filename <&3; do
    echo "Opening $filename"
    cygstart "$filename"
    read -p $'Open another random image? [Y/n]\n' -n 1 -r
    echo
    [[ $REPLY = [Nn] ]] && return 0
  done 3< <(
    find "$image_dir" -type f -exec file -0 --mime-type -- {}   \
    | while IFS= read -r -d '' filename && IFS= read -r desc; do
        [[ $desc = *image* ]] && printf '%s\0' "$filename"
      done \
    | shuf -z
  )
}

CodePudding user response:

file listings are rarely so gigantic it can't fit into RAM for awk :

 find … -print0 |

 mawk 'BEGIN {    FS = "\0"
              _^= RS = "^$" 
       } END { printf("%*s", srand()*!_, $(int(rand()*(NF-_)) _)) }' 

That'll randomly print out the filename for one of the image files found, with no trailing byte of either \0 or \n, without having to perform any sort of sorting/shuffling.

NF - 1 because find prints out final \0, so NF count is always 1 more than # of files found.

It also protects against an empty input instead of referencing a negative field number - simply nothing gets printed at all.

From there, you can decide you want to open this image file.

  • Related