So, i got some string that i want to get a pattern, the string has slight variation that can be string1 or string2
string1 = """
Rak penyimpanan berbentuk high chest dengan gaya American Country. Cocok digunakan untuk menyimpan
segala keperluan hunian Anda! Dibuat dengan rangka kayu mahoni, papan mdf dan finishing cat duco berkualitas. Kualitas ekspor akan menjamin kepuasan
Anda. Dikirim jadi, tanpa perakitan. Panjang 76 cm Kedalaman 40 cm Tinggi 120 cm
"""
string2 = """
Rak penyimpanan berbentuk high chest dengan gaya American Country. Cocok digunakan untuk menyimpan
segala keperluan hunian Anda! Dibuat dengan rangka kayu mahoni, papan mdf dan finishing cat duco berkualitas. Kualitas ekspor akan menjamin kepuasan
Anda. Dikirim jadi, tanpa perakitan. P 76 cm L 40 cm T 120 cm
"""
What i want to achieve is to capture group pattern and get (51, 23, 47-89) What i have done is create a pattern like this
pattern = (\bP|Panjang\b). (\d) . (\bL|Kedalaman\b). (\d) . (\bT|Tinggi\b). (\d) .[cm]
i have tried it in https://regexr.com/ but the group only capture the last digit such as (1,3,9) What am i missing, cause i already put after the \d in every group ?
CodePudding user response:
Regex
"(?:P|Panjang)\s(?P<P>\d )\scm\s(?:L|Kedalaman)\s(?P<L>\d )\scm\s(?:T|Tinggi)\s(?P<T>\d )\scm"g
About Regex:
- See Regex 101
- captures three groups:
P
,L
andT
- groups should have the digits match.
CodePudding user response:
You can:
- change the
.
to be more specific like\scm\s
or\s
- You can just match
cm
instead of using a character class[cm]
that might also matchccc
- If you only want the digits, you can omit the capture groups around the names
For example
\bP(?:anjang)?\s(\d )\scm\s(?:L|Kedalaman)\s(\d )\scm\sT(?:inggi)?\s(\d )\scm\b
Explanation
\b
A word boundary to prevent a partial word matchP(?:anjang)?\s
MatchP
and optionallyanjang
(\d )\scm\s
Capture 1 digits in group 1, and matchcm
(?:L|Kedalaman)\s
MatchL
orKedalaman
(\d )\scm\s
Capture 1 digits in group 2 and matchcm
T(?:inggi)?\s
MatchT
and optionallyinggi
(\d )\scm
Capture 1 digit in group 3 and matchcm
\b
A word boundary