I have a text file, test.txt, in (Mac OSX bash, UK english locale) containing three names, one of which is accented (diacritic)
test.txt
Ève
Eve
eve
I want to convert all to Title Case.
cat test.txt | gsed 's/.*/\L&/; s/[a-z]*/\u&/g'
yields the result
ÈVe
Eve
Eve
The issue is that the first one, ÈVe, should be Ève
The diacritic has been followed by an incorrect capitalisation within the name. How can I amend the pipe sequence to either prevent (preferable) or correct this issue? For the purposes of the question, please assume no LC_* environment variables are set
CodePudding user response:
Use character classes. [a-z]
is a till z. [[:alpha:]]
are all letters. https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html
sed 's/^\([[:alpha:]]\)\(.*\)/\U\1\L\2/g'
Also note that you have to have GNU sed with unicode support and locale with UTF-8 or proper encoding.
CodePudding user response:
If you can use perl :
perl -pe 's/^(\w)/\U$1/' test.txt
CodePudding user response:
As an alternative solution, I have found:
gawk '{for(j=1;j<=NF;j ){ $j=toupper(substr($j,1,1)) substr($j,2) }}1' "$inputFile"
works on an example set
Ève
Eve
eve
bob smith
Cases where eg bob_smith
exist are easily pre-/post-processed to suit
CodePudding user response:
With bash
:
declare -c string; while IFS= read -r string; do echo "$string"; done < file
Output:
Ève Eve Eve