Sorry if this question has been asked before, I couldn't find a good solution!
I have a couple of files that follow this exact format.
file1.txt
Page 1
text under page 1 for file 1
Page 2
text under page 2 for file 1
file2.txt
Page 1
text under page 1 for file 2
Page 2
text under page 2 for file 2
I'd need to merge these two files using the "Page" as a sort of delimiter. So my third file should be something like,
file3.txt
Page 1
text under page 1 for file 1
text under page 1 for file 2
Page 2
text under page 2 for file 1
text under page 2 for file 2
How would I go about achieving this?
CodePudding user response:
If file1.txt
has an entry for all the pages, you could also:
#!/bin/bash
csplit -f file1 file1.txt '/Page .*/' '{*}'
csplit -f file2 --suppress-matched file2.txt '/Page .*/' '{*}'
ls file??? | sed 's/file.//g' | sort -u | while read nr ; do
cat file1$nr >> file3.txt
cat file2$nr >> file3.txt
done
Note: quick & dirty solution; assumes all pages are in both documents, you should probably test whether the files exist before appending and the loop parses ls
which works if filenames in the directory are not exotic.
CodePudding user response:
This can be a solution:
#!/bin/bash
while read line; do
case $line in Page*)
header="$line"
;;
*) echo "$header|$line"
;;
esac
done < <(cat file1.txt file2.txt) | \
sort -t "|" -k1,1 | \
(
prev_header=
while IFS="|" read header line; do
[ "$prev_header" != "$header" ] &&
{ prev_header="$header"
echo "$header"; }
echo "$line"
done
)