How to split a camelCase string into an array in awk?-CodePudding

How can I split a camelCase string into an array in awk using the split function?

Input:

STRING="camelCasedExample"

Desired Result:

WORDS[1]="camel"
WORDS[2]="Cased"
WORDS[3]="Example"

Bad Attempt:

split(STRING, WORDS, /([a-z])([A-Z])/);

Bad Result:

WORDS[1]="came"
WORDS[2]="ase"
WORDS[3]="xample"

CodePudding user response：

You can't do it with split() alone which is why GNU awk has patsplit():

$ awk 'BEGIN {
    patsplit("camelCasedExample",words,/(^|[[:upper:]])[[:lower:]] /)
    for ( i in words ) print words[i]
}'
camel
Cased
Example

CodePudding user response：

With your shown samples, please try following. Written and tested in GNU awk should work in any awk. This will create array named words whose values could be accessed from index starting 1,2,3 and so on. I am printing it as an output, you can make use of it later on as per your wish too.

awk -F'=|"' -v s1="\"" '
{
  gsub(/[A-Z]/,"\n&",$3)
  val=(val?val ORS:"")$3
}
END{
  num=split(val,words,ORS)
  for(i=1;i<=num;i  ){
    if(words[i]!=""){
      print "WORDS["   count "]=" s1 words[i] s1
    }
  }
}
' Input_file

Explanation: Adding detailed explanation for above awk code.

awk -F'=|"' -v s1="\"" '                     ##Starting awk program, setting field separator as = OR " and setting s1 to " here.
{
  gsub(/[A-Z]/,"\n&",$3)                     ##Using gsub to globally substitute captial letter with new character and value itself in 3rd field.
  val=(val?val ORS:"") $3                    ##Creating val which has $3 in it and keep adding values in val itself.
}
END{                                         ##Starting END block of this program from here.
  num=split(val,words,ORS)                     ##Splitting val into array arr with delmiter of ORS.
  for(i=1;i<=num;i  ){                       ##Running for loop from value of 1 to till num here.
    if(words[i]!=""){                          ##Checking if arr item is NOT NULL then do following.
       print "WORDS["   count "]=" s1 words[i] s1    ##Printing WORDS[ value of i followed by ]= followed by s1 words[i] value and s1.
    }
  }
}
'  Input_file                                ##Mentioning Input_file name here.

CodePudding user response：

Here is an awk solution that would work with any version of awk:

s='camelCasedExample'
awk '{
   while (match($0, /(^|[[:upper:]])[[:lower:]] /)) {
      wrd = substr($0,RSTART,RLENGTH)
      print wrd
      # you can also store it in array
      arr[  n] = wrd
      $0 = substr($0,RSTART RLENGTH)
   }
}' <<< "$s"

camel
Cased
Example