Home > Software design >  Merging two files with common paragraph headings in bash [closed]
Merging two files with common paragraph headings in bash [closed]

Time:09-16

Sorry if this question has been asked before, I couldn't find a good solution!

I have a couple of files that follow this exact format.

file1.txt

Page 1
text under page 1 for file 1
Page 2
text under page 2 for file 1

file2.txt

Page 1
text under page 1 for file 2
Page 2
text under page 2 for file 2

I'd need to merge these two files using the "Page" as a sort of delimiter. So my third file should be something like,

file3.txt

Page 1
text under page 1 for file 1
text under page 1 for file 2
Page 2
text under page 2 for file 1
text under page 2 for file 2

How would I go about achieving this?

CodePudding user response:

If file1.txt has an entry for all the pages, you could also:

#!/bin/bash
csplit -f file1 file1.txt '/Page .*/' '{*}'
csplit -f file2 --suppress-matched  file2.txt '/Page .*/' '{*}'
ls  file??? | sed 's/file.//g' | sort -u | while read nr ; do
        cat file1$nr >> file3.txt
        cat file2$nr >> file3.txt
done

Note: quick & dirty solution; assumes all pages are in both documents, you should probably test whether the files exist before appending and the loop parses ls which works if filenames in the directory are not exotic.

CodePudding user response:

This can be a solution:

#!/bin/bash

while read line; do
   case $line in Page*)
       header="$line"
   ;;
   *) echo "$header|$line"
   ;;
   esac
done < <(cat file1.txt file2.txt) | \
sort -t "|" -k1,1 | \
(
prev_header=
while IFS="|" read header line; do
    [ "$prev_header" != "$header" ] &&
      { prev_header="$header"
        echo "$header"; }
    echo "$line"
done
)
  • Related