Home > Software engineering >  How to merge two files of different number of lines but maintaining the text of both aligned in the
How to merge two files of different number of lines but maintaining the text of both aligned in the

Time:09-12

Hello' i'am traslating the files of a old ps2 videogame and had been using two text files to have the original text and the translated text in the same lines, but both text have usually differents lines for the same dialog box, because of that i add some placeholder text when I find a text whose number of lines is different in both languages which takes me a long time.

I want to arrange both texts in a way that I have them all in one text and they are placed in the lines where they belong separated from their counterpart by a comma.

text 1 (translated text):

Hello this is
a text to ilustrate
my problem better

it is just a place
holder but i hope
that can help

this one is short
well, relatively

But this one is
pretty pretty 
pretty big, yeah
like, a lot...

text 2 (original text)

こんにちは説明するテキスト
私の問題はより良い

それはただの場所ですホルダーですが、私は願っていますそれは助けることができます

これは短いです
まあ、比較的

しかし、これは
かなりかわいい
かなり大きいです、はい
のように、たくさん...

the required output

Hello this is,こんにちは説明するテキスト
a text to illustrate,私の問題はより良い
my problem better

it is just a place,それはただの場所ですホルダーですが、私は願っていますそれは助けることができます
holder but i hope
that can help
 
this one is short,これは短いです
well, relatively,まあ、比較的

But this one is,しかし、これは
pretty pretty,かなりかわいい
pretty big, yeah,かなり大きいです、はい
like, a lot...,のように、たくさん...

The only common thing that they have are the blank line beetwhen each dialog box. Yesterday was trying to do it by editing the solution of this other thread (without good results)

https://unix.stackexchange.com/questions/632917/how-to-merge-two-files-of-different-number-of-lines-using-blank-line-condition

Thanks in advance <3.

CodePudding user response:

Tested using GNU awk 5.1 but will work in any awk that supports UTF-8 encoding (check the man page for whatever awk version you use):

$ awk -v RS= -F'\n' -v OFS=',' '
    NR==FNR { for (i=1; i<=NF; i  ) a[FNR,i]=$i; next }
    { for (i=1; i<=NF; i  ) print $i ((FNR,i) in a ? OFS a[FNR,i] : ""); print "" }
' text2 text1
Hello this is,こんにちは説明するテキスト
a text to ilustrate,私の問題はより良い
my problem better

it is just a place,それはただの場所ですホルダーですが、私は願っていますそれは助けることができます
holder but i hope
that can help

this one is short,これは短いです
well, relatively,まあ、比較的

But this one is,しかし、これは
pretty pretty ,かなりかわいい
pretty big, yeah,かなり大きいです、はい
like, a lot...,のように、たくさん...

You may need to set LC_ALL=C first or similar to ensure your locale supports UTF-8.

I assumed your text2 file doesn't really have every line starting with a blank as shown in your question, if it does then change a[FNR,i]=$i to { sub(/^ /,"",$i); a[FNR,i]=$i } in the first for loop.

  • Related