Hello' i'am traslating the files of a old ps2 videogame and had been using two text files to have the original text and the translated text in the same lines, but both text have usually differents lines for the same dialog box, because of that i add some placeholder text when I find a text whose number of lines is different in both languages which takes me a long time.
I want to arrange both texts in a way that I have them all in one text and they are placed in the lines where they belong separated from their counterpart by a comma.
text 1 (translated text):
Hello this is
a text to ilustrate
my problem better
it is just a place
holder but i hope
that can help
this one is short
well, relatively
But this one is
pretty pretty
pretty big, yeah
like, a lot...
text 2 (original text)
こんにちは説明するテキスト
私の問題はより良い
それはただの場所ですホルダーですが、私は願っていますそれは助けることができます
これは短いです
まあ、比較的
しかし、これは
かなりかわいい
かなり大きいです、はい
のように、たくさん...
the required output
Hello this is,こんにちは説明するテキスト
a text to illustrate,私の問題はより良い
my problem better
it is just a place,それはただの場所ですホルダーですが、私は願っていますそれは助けることができます
holder but i hope
that can help
this one is short,これは短いです
well, relatively,まあ、比較的
But this one is,しかし、これは
pretty pretty,かなりかわいい
pretty big, yeah,かなり大きいです、はい
like, a lot...,のように、たくさん...
The only common thing that they have are the blank line beetwhen each dialog box. Yesterday was trying to do it by editing the solution of this other thread (without good results)
Thanks in advance <3.
CodePudding user response:
Tested using GNU awk 5.1 but will work in any awk that supports UTF-8 encoding (check the man page for whatever awk version you use):
$ awk -v RS= -F'\n' -v OFS=',' '
NR==FNR { for (i=1; i<=NF; i ) a[FNR,i]=$i; next }
{ for (i=1; i<=NF; i ) print $i ((FNR,i) in a ? OFS a[FNR,i] : ""); print "" }
' text2 text1
Hello this is,こんにちは説明するテキスト
a text to ilustrate,私の問題はより良い
my problem better
it is just a place,それはただの場所ですホルダーですが、私は願っていますそれは助けることができます
holder but i hope
that can help
this one is short,これは短いです
well, relatively,まあ、比較的
But this one is,しかし、これは
pretty pretty ,かなりかわいい
pretty big, yeah,かなり大きいです、はい
like, a lot...,のように、たくさん...
You may need to set LC_ALL=C
first or similar to ensure your locale supports UTF-8.
I assumed your text2
file doesn't really have every line starting with a blank as shown in your question, if it does then change a[FNR,i]=$i
to { sub(/^ /,"",$i); a[FNR,i]=$i }
in the first for
loop.