I have the below method in Python (3.10):
import re
def camel_to_screaming_snake(value):
return '_'.join(re.findall('[A-Z][a-z] |[0-9A-Z] (?=[A-Z][a-z])|[0-9A-Z]{2,}|[a-z0-9]{2,}|[a-zA-Z0-9]'
, value)).upper()
The goal I am trying to accomplish is to insert an underscore every time the case in a string changes, or between a non-numeric character and a numeric character. However, the following cases being passed in are not yielding expected results. (left is what is passed in, right is what is returned)
vn3b -> VN3B
vnRbb250V -> VN_RBB_250V
I am expecting the following return values:
vn3b -> VN_3_B
vnRbb250V -> VN_RBB_250_V
What is wrong with my regex preventing this from working as expected?
CodePudding user response:
You want to catch particular boundaries and insert an _
in them. To find a boundary, you can use the following regex format: (?<=<before>)(?=<after>)
. These are non capturing lookaheads and lookbehinds where before
and after
are replaced with the conditions.
In this case you have defined 3 ways that you might need to find a boundary:
- A lowercase letter followed by an uppercase letter:
(?<=[a-z])(?=[A-Z])
- A number followed by a letter:
(?<=[0-9])(?=[a-zA-Z])
- A letter followed by a number:
(?<=[a-zA-Z])(?=[0-9])
You can combine these into a single regex with an or |
and then call sub
on it to replace the boundaries. Afterwards you can uppercase the output:
import re
boundaries_re = re.compile(
r"((?<=[a-z])(?=[A-Z]))"
r"|((?<=[0-9])(?=[a-zA-Z]))"
r"|((?<=[a-zA-Z])(?=[0-9]))"
)
def format_var(text):
return boundaries_re.sub("_", text).upper()
>>> format_var("vn3b")
VN_3_B
>>> format_var("vnRbb250V")
VN_RBB_250_V