The Problem
Given a BASH pipeline:
./a.sh | ./b.sh
The PID of ./a.sh
being 10.
Is there way to find the PID of ./a.sh
from within ./b.sh
?
I.e. if there is, and if ./b.sh
looks something like the below:
#!/bin/bash
...
echo $LEFT_PID
cat
Then the output of ./a.sh | ./b.sh
would be:
10
... Followed by whatever else ./a.sh printed to stdout.
Background
I'm working on this bash script, named cachepoint
, that I can place in a pipeline to speed things up.
E.g. cat big_data | sed 's/a/b/g' | uniq -c | cachepoint | sort -n
This is a purposefully simple example.
The pipeline may run slowly at first, but on subsequent runs, it will be quicker, as cachepoint
starts doing the work.
The way I picture cachepoint
working is that it would use the first few hundred lines of input, along with a list of commands before it, in order to form a hash ID for the previously cached data, thus breaking the stdin pipeline early on subsequent runs, resorting instead to printing the cached data. Cached data would get deleted every hour or so.
I.e. everything left of | cachepoint
would continue running, perhaps to 1,000,000 lines, in normal circumstances, but on subsequent executions of cachepoint
pipelines, everything left of | cachepoint
would exit after maybe 100 lines, and cachepoint
would simply print the millions of lines it has cached. For the hash of the pipe sources and pipe content, I need a way for cachepoint
to read the PIDs of what came before it in the pipeline.
I use pipelines a lot for exploring data sets, and I often find myself piping to temporary files in order to bypass repeating the same costly pipeline more than once. This is messy, so I want cachepoint
.
CodePudding user response:
This Shellcheck-clean code should work for your b.sh
program on any Linux system:
#! /bin/bash
shopt -s extglob
shopt -s nullglob
left_pid=
# Get the identifier for the pipe connected to the standard input of this
# process (e.g. 'pipe:[10294010]')
input_pipe_id=$(readlink "/proc/self/fd/0")
if [[ $input_pipe_id != pipe:* ]]; then
echo 'ERROR: standard input is not a pipe' >&2
exit 1
fi
# Find the process that has standard output connected to the same pipe
for stdout_path in /proc/ ([[:digit:]])/fd/1; do
output_pipe_id=$(readlink -- "$stdout_path")
if [[ $output_pipe_id == "$input_pipe_id" ]]; then
procpid=${stdout_path%/fd/*}
left_pid=${procpid#/proc/}
break
fi
done
if [[ -z $left_pid ]]; then
echo "ERROR: Failed to set 'left_pid'" >&2
exit 1
fi
echo "$left_pid"
cat
- It depends on the fact that, on Linux, for a process with id PID the path
/proc/PID/fd/0
looks like a symlink to the device connected to the standard input of the process and/proc/PID/fd/1
looks like a symlink to the device connected to the standard output of the process.