I'm trying to create a regex to find concatenated strings, or strings with randomly capitalized words. I need to find things like: "EmployeeID", "messageIndex", "JOBname", "KeyRange", "Type21", etc. I'm doing well finding concatenated texts with a delimiter such as an underscore using "^[A-Za-z0-9] (?:_[A-Za-z0-9] ) $", but how can I find strings without a delimiter? I just keep finding all words. Thanks!
CodePudding user response:
Please see if this works for your use case:
\b((([A-Z0-9] [a-z0-9]*) )|(([a-z0-9] [A-Z0-9]*)) )\b
I tested this on the following string and was able to match each of them: "EmployeeID", "messageIndex", "JOBname", "KeyRange", "Type21"
This checks for an alternation of capital and lowercase letters (as well as numbers) that exists at least once and may or may not repeat. The \b is to establish word boundaries. Without you posting the exact string (or a variation of it if there is sensitive information) you are attempting to parse, it is difficult to determine the appropriate expression for your situation without knowing the structure of your strings.
CodePudding user response:
Use
^[A-Za-z] (?:[A-Z0-9] [A-Za-z0-9]*) $
See regex proof. In brief: start matching at least one letter, then require an uppercase letter or digit and then match any letters and digits.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[A-Za-z] any character of: 'A' to 'Z', 'a' to 'z'
(1 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[A-Z0-9] any character of: 'A' to 'Z', '0' to '9'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[A-Za-z0-9]* any character of: 'A' to 'Z', 'a' to
'z', '0' to '9' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string