How can I split a camelCase string into an array in awk using the split function?
Input:
STRING="camelCasedExample"
Desired Result:
WORDS[1]="camel"
WORDS[2]="Cased"
WORDS[3]="Example"
Bad Attempt:
split(STRING, WORDS, /([a-z])([A-Z])/);
Bad Result:
WORDS[1]="came"
WORDS[2]="ase"
WORDS[3]="xample"
CodePudding user response:
You can't do it with split()
alone which is why GNU awk has patsplit()
:
$ awk 'BEGIN {
patsplit("camelCasedExample",words,/(^|[[:upper:]])[[:lower:]] /)
for ( i in words ) print words[i]
}'
camel
Cased
Example
CodePudding user response:
With your shown samples, please try following. Written and tested in GNU awk
should work in any awk
. This will create array named words
whose values could be accessed from index starting 1,2,3 and so on. I am printing it as an output, you can make use of it later on as per your wish too.
awk -F'=|"' -v s1="\"" '
{
gsub(/[A-Z]/,"\n&",$3)
val=(val?val ORS:"")$3
}
END{
num=split(val,words,ORS)
for(i=1;i<=num;i ){
if(words[i]!=""){
print "WORDS[" count "]=" s1 words[i] s1
}
}
}
' Input_file
Explanation: Adding detailed explanation for above awk
code.
awk -F'=|"' -v s1="\"" ' ##Starting awk program, setting field separator as = OR " and setting s1 to " here.
{
gsub(/[A-Z]/,"\n&",$3) ##Using gsub to globally substitute captial letter with new character and value itself in 3rd field.
val=(val?val ORS:"") $3 ##Creating val which has $3 in it and keep adding values in val itself.
}
END{ ##Starting END block of this program from here.
num=split(val,words,ORS) ##Splitting val into array arr with delmiter of ORS.
for(i=1;i<=num;i ){ ##Running for loop from value of 1 to till num here.
if(words[i]!=""){ ##Checking if arr item is NOT NULL then do following.
print "WORDS[" count "]=" s1 words[i] s1 ##Printing WORDS[ value of i followed by ]= followed by s1 words[i] value and s1.
}
}
}
' Input_file ##Mentioning Input_file name here.
CodePudding user response:
Here is an awk
solution that would work with any version of awk
:
s='camelCasedExample'
awk '{
while (match($0, /(^|[[:upper:]])[[:lower:]] /)) {
wrd = substr($0,RSTART,RLENGTH)
print wrd
# you can also store it in array
arr[ n] = wrd
$0 = substr($0,RSTART RLENGTH)
}
}' <<< "$s"
camel
Cased
Example