My goal is to have some sort of script which can turn "blocks" into single line strings. For example, turning this,
ジナ「あんまり、おそくならないようにね。
さ、行ってらっしゃい。
ジナ「あっ、そうそう。
はい、おこづかい。
お祭り楽しんでらっしゃい。
into this.
ジナ「あんまり、おそくならないようにね。さ、行ってらっしゃい。
ジナ「あっ、そうそう。はい、おこづかい。お祭り楽しんでらっしゃい。
For an english example, turning this,
MOM: Run along now, and be back
before dinner.
MOM: Oh, I almost forgot!
Here's your allowance, dear!
Have fun at the fair!
into this.
MOM: Run along now, and be back before dinner.
MOM: Oh, I almost forgot! Here's your allowance, dear! Have fun at the fair!
However this would add the additional (and unnecessary) challenge of adding an extra space for each word, which doesn't need to be done for the Japanese text, simply use it as a way of understanding what I wish to happen I suppose. I'm assuming I'd need a sed/awk script because while I considered regex, it just seems I'd need a more powerful tool. Any solution would be wonderful though!
CodePudding user response:
Sounds like you want to change the output records separator (ORS) to two newlines, and change the field separator (FS) to a single space. So just do that:
$ cat input
MOM: Run along now, and be back
before dinner.
MOM: Oh, I almost forgot!
Here's your allowance, dear!
Have fun at the fair!
$ awk '{$1=$1}1' RS= OFS=' ' ORS='\n\n' input
MOM: Run along now, and be back before dinner.
MOM: Oh, I almost forgot! Here's your allowance, dear! Have fun at the fair
Setting RS
to the empty string causes awk
to treat a blank line (a line with no text, not including lines that are only whitespace) as the record separator, which seems to be what you mean by a "block".