Home > Blockchain >  Extract all numbers and text separetely
Extract all numbers and text separetely

Time:06-02

I have a text like this:

81. Text1

82. Text2
82.1. Some text3
82.2. Some long text goes there in two or more lines... Some more text goes here...
 
83. Text4

84. Text5

It has some random spacing between the lines. I'm trying to extract every single option separately. So for example my output for 82.2. should be like this: "82.2." and "Some long text goes there in two or more lines... Some more text goes here...".

I've already tried to do this like that:

$exp = explode(". ", $text);
foreach($exp as $newline) {
    
    echo explode(". ", $newline)[0];
    }

But probably that's not the best idea, because sometimes there's an ". " in the end of sentence.

CodePudding user response:

You're on the right track making use of explode:

$output = [];
$input = '81. Text1

82. Text2
82.1. Some text3
82.2. Some long text goes there in two or more lines... Some more text goes here...
 
83. Text4

84. Text5';

// split lines, trim any whitespace on each line and remove any that are empty
// PHP_EOL may need to be changed to how newlines are encoded in the text file
$lines = array_filter(array_map('trim', explode(PHP_EOL, $input)));

foreach ($lines as $line) {
    $split = explode('. ', $line);

    // The number will be the first element
    $number = trim(array_shift($split));

    // Join the rest of the elements together
    $text = implode('', $split);

    $output[] = [
        'number' => $number,
        'text' => $text
    ];
}

var_dump($output);

This yields:

array(6) {
  [0]=>
  array(2) {
    ["number"]=>
    string(2) "81"
    ["text"]=>
    string(5) "Text1"
  }
  [1]=>
  array(2) {
    ["number"]=>
    string(2) "82"
    ["text"]=>
    string(5) "Text2"
  }
  [2]=>
  array(2) {
    ["number"]=>
    string(4) "82.1"
    ["text"]=>
    string(10) "Some text3"
  }
  [3]=>
  array(2) {
    ["number"]=>
    string(4) "82.2"
    ["text"]=>
    string(75) "Some long text goes there in two or more lines..Some more text goes here..."
  }
  [4]=>
  array(2) {
    ["number"]=>
    string(2) "83"
    ["text"]=>
    string(5) "Text4"
  }
  [5]=>
  array(2) {
    ["number"]=>
    string(2) "84"
    ["text"]=>
    string(5) "Text5"
  }
}

CodePudding user response:

You can use the limit parameter of the explode function to only get two results:

$str = <<<EOD
81. Text1

82. Text2
82.1. Some text3
82.2. Some long text goes there in two or more lines... Some more text goes here...
 
83. Text4

84. Text5
EOD;

foreach (explode("\n", $str) as $line) {
    if (trim($line) == "") {
        continue;
    }
    list($prefix, $text) = explode(" ", $line, 2);
    echo $prefix . " -> " . $text . "\n";
}

This prints:

81. -> Text1
82. -> Text2
82.1. -> Some text3
82.2. -> Some long text goes there in two or more lines... Some more text goes here...
83. -> Text4
84. -> Text5

CodePudding user response:

You can use a simple multiline regex to split the text and finish this in just 2 lines(concise code).

  • Match all digits and period character from the start. Capture them in a group. ^([\d.] )
  • Match the rest of the string in another group. (.*)$.
  • Now, use preg_match_all to match all of those lines and pass an array as a third parameter to store those matches. (say $matches).
  • Use array_map to merge captured groups 1 and 2.

Snippet:

<?php

preg_match_all('/^([\d.] )(.*)$/m', $str, $matches);
$result = array_map(fn($v1, $v2) => [ $v1, $v2] , $matches[1], $matches[2]);

print_r($result);

Online Demo

  •  Tags:  
  • php
  • Related