Home > Enterprise >  Is it possible to read the same pipe twice in bash?
Is it possible to read the same pipe twice in bash?

Time:12-27

Here is my code:

ls | grep -E '^application--[0-9]{4}-[0-9]{2}.tar.gz$' | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}' | xargs -r echo
ls | grep -E '^application--[0-9]{4}-[0-9]{2}.tar.gz$' | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}' | xargs -r rm

As you can see it will get a list of files, show it on screen (for logging purpose) and then delete it. The issue is that if a file was created between first and second line gets executed, I will delete a file without logging that fact.

Is there a way to create a script that will read the same pipe twice, so the awk result will be piped to both xargs echo and xargs rm commands?

I know I can use a file as a temporary buffer, but I would like to avoid that.

CodePudding user response:

You can change your command to something like

touch example
ls example* | tee >(xargs rm)

I would prefer to avoid parsing ls:

while IFS= read -r file; do
  if [[ "$1" < "application--${CLEAR_DATE_LEVEL0}.tar.gz" ]]; then
    echo "Removing ${file}"
    rm "${file}" 
  fi
done < <(find . -regextype egrep -regex "./application--[0-9]{4}-[0-9]{2}.tar.gz")

EDIT: An improvement:
As @tripleee mentioned is their answer, using rm -v avoids the additional echo and will also avoid an echo when removing a file failed.

CodePudding user response:

No, a pipe is a stream - once you read something from it, it is forever gone from the pipe.

A good general solution is to use a temporary file; this lets you rewind and replay it. Just take care to remove it when you're done.

temp=$(mktemp -t) || exit
trap 'rm -f "$temp"' ERR EXIT
cat >"$temp"

cat "$temp"
xargs rm <"$temp"

The ERR and EXIT pseudo-signals are Bash extensions. For POSIX portability, you need a somewhat more involved set of trap commands.

Properly speaking, mktemp should receive an argument which is used as a template for the temporary file's name, so that the user can see which temporary file belongs to which tool. For example, if this script was called rmsponge, you could use mktemp rmspongeXXXXXXXXX to have mktemp generate a temporary file name which begins with rmsponge.

If you only expect a limited amount of input, perhaps just capture the input in a variable. However, this scales poorly, and could have rather unfortunate problems if the input data exceeds available memory;

# XXX avoid: scales poorly
values=$(cat)
xargs printf "%s\n" <<<"$values"
xargs rm <<<"$values"

The <<< "here string" syntax is also a Bash extension. This also suffers from the various issues from https://mywiki.wooledge.org/BashFAQ/020 but this is inherent to your problem articulation.

Of course, in this individual case, just use rm -v to see which files rm removes.

CodePudding user response:

For your specific case, you don't need to read the pipe twice, you can just use rm -v to have rm itself also "echo" each file.

Also, in cases like this, it is better for shell scripts to use globs instead grep ..., both for robustness and performance reasons.

And once you do that, even better: you can loop on the glob and not go through any pipes at all (even more robust in the general case, because there are even less places to worry "could a character in this be special to that program?", and might perform better because everything stays in one process):

for file in application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz
do
    if [[ "$file" < "application--${CLEAR_DATE_LEVEL0}.tar.gz" ]]
    then
        # echo "$file"
        # rm "$file"
        rm -v "$file"
    fi
done

But if you find yourself in a situation where you really do need to get data from a pipe and a glob won't work, there are a couple ways:

One neat trick in the shell is that loops and other compound commands can be pipes - so a loop can read a pipe, and the inside of the loop can have all the commands you wanted to have read from the pipe:

ls ... | awk ... | while IFS="" read -r file
do
    # echo "$file"
    # rm "$file"
    rm -v "$file"
done

(As a general best practice, you'd want to set IFS= to the empty string for the read command so that read doesn't split the input on characters like spaces, and give read the -r argument to tell it to not interpret special characters like backslashes. In your specific case it doesn't matter.)

But if a loop doesn't work for what you need, then in the general case, you can catch the result of a pipe in a shell array variable:

pipe_contents=($(ls application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}'))

echo "${pipe_contents[@]}"
rm "${pipe_contents[@]}"

(This works fine unless your pipe output contains characters that would be special to the shell at the point that the pipe output has to be unquoted - in this case, the array is using field splitting on the unquoted pipe output, so any whitespace or globbing characters in the file names would be bad, but in this case you don't have any of those.)

If you need your solution to work in POSIX sh, or otherwise more portably than just bash, then you can't use arrays, so you end up having to settle for something like this:

pipe_contents="$(ls application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}')"

echo "$pipe_contents"
rm $pipe_contents  # gotta leave it unquoted

You can also use a temporary file instead of a shell variable, but you said you want to avoid that. I also prefer a variable when the data fits in memory because Linux/UNIX does not give shell scripts a reliable way to clean up external resources (you can use trap but for example traps can't run on uncatchable signals).

P.S. ideally, in the general habit, you should use printf '%s\n' "$foo" instead of echo "$foo", because echo has various special cases (and portability inconsistencies, but that doesn't matter as much if you always use bash until you need to care about portable sh).

  • Related