Home > database >  How to break item and price with preg_match_all() or substr() or explode()?
How to break item and price with preg_match_all() or substr() or explode()?

Time:09-03

Galax RTX2060 1 Click 12GB OC .................729 GALAX GTX 1660 SUPER 1-CLICK OC 6GB ...679 GALAX RTX3050 8GB GDDR6 ......................450 

I am thinking if I can split the string at each (product ...price):

  1. Galax RTX2060 1 Click 12GB OC .................729
  2. GALAX GTX 1660 SUPER 1-CLICK OC 6GB ...679
  3. GALAX RTX3050 8GB GDDR6 ......................450

Then str_replace('.', ' ') -> Galax RTX2060 1 Click 12GB OC 729

Finally, explode() by 'last' space (before the price) to get

  • array[0] = Galax RTX2060 1 Click 12GB OC
  • array[1] = 729
preg_match_all('/...\d /', $input_lines, $output_array);
array(1 0 => array(3 0 => .729 1 => .679 2 => .450 ) )
 $pattern = '/(?P<digit>\d ) (?P<name>\w )/';
 preg_match($pattern, $text, $matches);
 array(5
 0   =>  array(5
 0   =>  2060 1
 1   =>  729 GALAX
 2   =>  1660 SUPER
 3   =>  679 GALAX
 4   =>  3050 8GB
 )
 digit   =>  array(5
 0   =>  2060
 1   =>  729
 2   =>  1660
 3   =>  679
 4   =>  3050
 )
 1   =>  array(5
 0   =>  2060
 1   =>  729
 2   =>  1660
 3   =>  679
 4   =>  3050
 )
 name    =>  array(5
 0   =>  1
 1   =>  GALAX
 2   =>  SUPER
 3   =>  GALAX
 4   =>  8GB
 )
 2   =>  array(5
 0   =>  1
 1   =>  GALAX
 2   =>  SUPER
 3   =>  GALAX
 4   =>  8GB
 )
 )

CodePudding user response:

preg_match_all() can be made to work here:

$input_lines = "Galax RTX2060 1 Click 12GB OC .................729 GALAX GTX 1660 SUPER 1-CLICK OC 6GB ...679 GALAX RTX3050 8GB GDDR6 ......................450";
preg_match_all('/\s*(.*?)\s*\.{3,}\s*(\d (?:\.\d )?)/', $input_lines, $output_array);
print_r($output_array);

This places the product descriptions and prices into two separate 1D arrays.

Here is an explanation of the regex pattern:

  • \s* match optional leading whitespace
  • (.*?) match and capture the product description
  • \s* optional whitespace
  • \.{3,} match ellipsis ... defined as 3 or more contiguous dots
  • \s* more optional whitespace
  • (\d (?:\.\d )?) match and capture the price as an integer or float
  • Related