I have a text file like so,
#greenऔर
<सेमीकोलन>
actionएक्शनmysql
admin2को
The expected output is,
#green और
< सेमीकोलन >
action एक्शन mysql
admin2 को
This is what I have tried to do so far, sed 's/[अ-ह].*/ &/g' testfile
but the output that I am getting is like this,
#green और
< सेमीकोलन>
action एक्शनmysql
admin2 को
Is there anyway that it could be achieved using awk or sed to get the expected output?
CodePudding user response:
You can use Perl here:
perl -i -CSD -Mutf8 -pe 's/(?<=[अ-ह\p{M}])(?=[^अ-ह\p{M}])|(?<=[^अ-ह\p{M}])(?=[अ-ह])/ /g' filename
See the regex demo. See the online demo:
#!/bin/bash
s='#greenऔर
<सेमीकोलन>
actionएक्शनmysql
admin2को'
perl -CSD -Mutf8 -pe 's/(?<=[अ-ह\p{M}])(?=[^अ-ह\p{M}])|(?<=[^अ-ह\p{M}])(?=[अ-ह])/ /g' <<< "$s"
Output:
#green और
< सेमीकोलन >
action एक्शन mysql
admin2 को
The regex matches
(?<=[अ-ह\p{M}])(?=[^अ-ह\p{M}])
- a location between a Devanagari letter from the[अ-ह]
range or a diacritic mark (\p{M}
) and a char other than the Devanagari letter and a diacritic mark|
- or(?<=[^अ-ह\p{M}])(?=[अ-ह])
- a location between a char other than the Devanagari letter and a diacritic mark and a Devanagari letter or a diacritic mark.
CodePudding user response:
The .*
matches the entire remainder of the line, and renders the g
flag useless. Assuming the character class is correct (sorry, I'm unfamiliar with Devanagari) you could use
sed 's/[अ-ह]\ / & /g' testfile
though you'll probably end up with some extra spaces you'll want to remove.
sed 's/[अ-ह]\ / &/g;
s/^ //;s/ $//;s/ / /g' testfile