Home > Blockchain >  Bash Converting to title case without letters following a diacritic being capitalised
Bash Converting to title case without letters following a diacritic being capitalised

Time:01-21

I have a text file, test.txt, in (Mac OSX bash, UK english locale) containing three names, one of which is accented (diacritic)

test.txt

Ève
Eve
eve

I want to convert all to Title Case.

cat test.txt  | gsed 's/.*/\L&/; s/[a-z]*/\u&/g'

yields the result

ÈVe
Eve
Eve

The issue is that the first one, ÈVe, should be Ève

The diacritic has been followed by an incorrect capitalisation within the name. How can I amend the pipe sequence to either prevent (preferable) or correct this issue? For the purposes of the question, please assume no LC_* environment variables are set

CodePudding user response:

Use character classes. [a-z] is a till z. [[:alpha:]] are all letters. https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html

sed 's/^\([[:alpha:]]\)\(.*\)/\U\1\L\2/g'

Also note that you have to have GNU sed with unicode support and locale with UTF-8 or proper encoding.

CodePudding user response:

If you can use perl :

perl -pe 's/^(\w)/\U$1/' test.txt

CodePudding user response:

As an alternative solution, I have found:


gawk '{for(j=1;j<=NF;j  ){ $j=toupper(substr($j,1,1)) substr($j,2) }}1' "$inputFile"

works on an example set


Ève
Eve
eve
bob smith

Cases where eg bob_smith exist are easily pre-/post-processed to suit

CodePudding user response:

With bash:

declare -c string; while IFS= read -r string; do echo "$string"; done < file

Output:

Ève
Eve
Eve
  • Related