Home > Software design >  Get Number Group of Regex Python
Get Number Group of Regex Python

Time:08-07

So, i got some string that i want to get a pattern, the string has slight variation that can be string1 or string2

string1 = """
    Rak penyimpanan berbentuk high chest dengan gaya American Country.  Cocok digunakan untuk menyimpan 
segala keperluan hunian Anda! Dibuat dengan rangka kayu mahoni, papan mdf dan finishing cat duco berkualitas.  Kualitas ekspor akan menjamin kepuasan 
Anda.  Dikirim jadi, tanpa perakitan. Panjang 76 cm Kedalaman 40 cm Tinggi 120 cm
"""

string2 = """
    Rak penyimpanan berbentuk high chest dengan gaya American Country.  Cocok digunakan untuk menyimpan 
segala keperluan hunian Anda! Dibuat dengan rangka kayu mahoni, papan mdf dan finishing cat duco berkualitas.  Kualitas ekspor akan menjamin kepuasan 
Anda.  Dikirim jadi, tanpa perakitan. P 76 cm L 40 cm T 120 cm
"""

What i want to achieve is to capture group pattern and get (51, 23, 47-89) What i have done is create a pattern like this

pattern = (\bP|Panjang\b). (\d) . (\bL|Kedalaman\b). (\d) . (\bT|Tinggi\b). (\d) .[cm] 

i have tried it in https://regexr.com/ but the group only capture the last digit such as (1,3,9) What am i missing, cause i already put after the \d in every group ?

CodePudding user response:

Regex

"(?:P|Panjang)\s(?P<P>\d )\scm\s(?:L|Kedalaman)\s(?P<L>\d )\scm\s(?:T|Tinggi)\s(?P<T>\d )\scm"g

About Regex:

  • See Regex 101
  • captures three groups: P, L and T
  • groups should have the digits match.

CodePudding user response:

You can:

  • change the . to be more specific like \scm\s or \s
  • You can just match cm instead of using a character class [cm] that might also match ccc
  • If you only want the digits, you can omit the capture groups around the names

For example

\bP(?:anjang)?\s(\d )\scm\s(?:L|Kedalaman)\s(\d )\scm\sT(?:inggi)?\s(\d )\scm\b

Explanation

  • \b A word boundary to prevent a partial word match
  • P(?:anjang)?\s Match P and optionally anjang
  • (\d )\scm\s Capture 1 digits in group 1, and match cm
  • (?:L|Kedalaman)\s Match L or Kedalaman
  • (\d )\scm\s Capture 1 digits in group 2 and match cm
  • T(?:inggi)?\s Match T and optionally inggi
  • (\d )\scm Capture 1 digit in group 3 and match cm
  • \b A word boundary

Regex demo

  • Related