Home > Software design >  How to add space between Devanagari and English in bash script?
How to add space between Devanagari and English in bash script?

Time:03-20

I have a text file like so,

#greenऔर
<सेमीकोलन>
actionएक्शनmysql
admin2को

The expected output is,

#green और
< सेमीकोलन >
action एक्शन mysql
admin2 को

This is what I have tried to do so far, sed 's/[अ-ह].*/ &/g' testfile but the output that I am getting is like this,

#green और
< सेमीकोलन>
action एक्शनmysql
admin2 को

Is there anyway that it could be achieved using awk or sed to get the expected output?

CodePudding user response:

You can use Perl here:

perl -i -CSD -Mutf8 -pe 's/(?<=[अ-ह\p{M}])(?=[^अ-ह\p{M}])|(?<=[^अ-ह\p{M}])(?=[अ-ह])/ /g' filename

See the regex demo. See the online demo:

#!/bin/bash
s='#greenऔर
<सेमीकोलन>
actionएक्शनmysql
admin2को'
perl -CSD -Mutf8 -pe 's/(?<=[अ-ह\p{M}])(?=[^अ-ह\p{M}])|(?<=[^अ-ह\p{M}])(?=[अ-ह])/ /g' <<< "$s"

Output:

#green और 
< सेमीकोलन >
action एक्शन mysql
admin2 को 

The regex matches

  • (?<=[अ-ह\p{M}])(?=[^अ-ह\p{M}]) - a location between a Devanagari letter from the [अ-ह] range or a diacritic mark (\p{M}) and a char other than the Devanagari letter and a diacritic mark
  • | - or
  • (?<=[^अ-ह\p{M}])(?=[अ-ह]) - a location between a char other than the Devanagari letter and a diacritic mark and a Devanagari letter or a diacritic mark.

CodePudding user response:

The .* matches the entire remainder of the line, and renders the g flag useless. Assuming the character class is correct (sorry, I'm unfamiliar with Devanagari) you could use

sed 's/[अ-ह]\ / & /g' testfile

though you'll probably end up with some extra spaces you'll want to remove.

sed 's/[अ-ह]\ / &/g;
    s/^ //;s/ $//;s/  / /g' testfile
  • Related