When piping in BASH, is it possible to get the PID of the left command from within the right command-CodePudding

The Problem

Given a BASH pipeline:

./a.sh | ./b.sh

The PID of ./a.sh being 10.

Is there way to find the PID of ./a.sh from within ./b.sh?

I.e. if there is, and if ./b.sh looks something like the below:

#!/bin/bash
...
echo $LEFT_PID
cat

Then the output of ./a.sh | ./b.sh would be:

10
... Followed by whatever else ./a.sh printed to stdout.

Background

I'm working on this bash script, named cachepoint, that I can place in a pipeline to speed things up.

E.g. cat big_data | sed 's/a/b/g' | uniq -c | cachepoint | sort -n

This is a purposefully simple example.

The pipeline may run slowly at first, but on subsequent runs, it will be quicker, as cachepoint starts doing the work.

The way I picture cachepoint working is that it would use the first few hundred lines of input, along with a list of commands before it, in order to form a hash ID for the previously cached data, thus breaking the stdin pipeline early on subsequent runs, resorting instead to printing the cached data. Cached data would get deleted every hour or so.

I.e. everything left of | cachepoint would continue running, perhaps to 1,000,000 lines, in normal circumstances, but on subsequent executions of cachepoint pipelines, everything left of | cachepoint would exit after maybe 100 lines, and cachepoint would simply print the millions of lines it has cached. For the hash of the pipe sources and pipe content, I need a way for cachepoint to read the PIDs of what came before it in the pipeline.

I use pipelines a lot for exploring data sets, and I often find myself piping to temporary files in order to bypass repeating the same costly pipeline more than once. This is messy, so I want cachepoint.

CodePudding user response：

This Shellcheck-clean code should work for your b.sh program on any Linux system:

#! /bin/bash

shopt -s extglob
shopt -s nullglob

left_pid=

# Get the identifier for the pipe connected to the standard input of this
# process (e.g. 'pipe:[10294010]')
input_pipe_id=$(readlink "/proc/self/fd/0")
if [[ $input_pipe_id != pipe:* ]]; then
    echo 'ERROR: standard input is not a pipe' >&2
    exit 1
fi

# Find the process that has standard output connected to the same pipe
for stdout_path in /proc/ ([[:digit:]])/fd/1; do
    output_pipe_id=$(readlink -- "$stdout_path")
    if [[ $output_pipe_id == "$input_pipe_id" ]]; then
        procpid=${stdout_path%/fd/*}
        left_pid=${procpid#/proc/}
        break
    fi
done

if [[ -z $left_pid ]]; then
    echo "ERROR: Failed to set 'left_pid'" >&2
    exit 1
fi

echo "$left_pid"
cat

It depends on the fact that, on Linux, for a process with id PID the path /proc/PID/fd/0 looks like a symlink to the device connected to the standard input of the process and /proc/PID/fd/1 looks like a symlink to the device connected to the standard output of the process.