Home > Back-end >  extract blocks from text file witch their first line starts with same string
extract blocks from text file witch their first line starts with same string

Time:06-22

I have a dynamic text file, and it's content could have fixed lines (repeated once) and x repeated blocks. Each block starts with the same code line "S21.G00.30.001" , but they could haven't the same contents, this is an extract from the content:

S10.G00.00.001,'www.mywebsite.com' //fixed line
S10.G00.00.002,'Company name' // fixed line
S10.G00.00.003,'v2.01' //fixed line
S10.G00.00.005,'02' //fixed line
.............

S21.G00.30.001,'employee one' //block 1
S21.G00.30.002,'AAAA'
S21.G00.30.004,'BBBB' 
S21.G00.30.005,'02'
S21.G00.30.006,'16021993'
S21.G00.30.007,'4'
S21.G00.30.008,'A Renasca'
S21.G00.30.001,'employee 2' //block 2
S21.G00.30.002,'CCCC'
S21.G00.30.004,'DDDD' 
S21.G00.30.005,'02'
S21.G00.30.001,'employee 3' //block 3
S21.G00.30.002,'EEEE'
S21.G00.30.004,'FFFF' 
S21.G00.30.005,'02'
S21.G00.30.007,'4'
S21.G00.30.008,'some text 3'

So, to get the fixed lines values witch are repeated only once, I use this method :

$file = fopen($this->getParameter('dsn_txt_folder') . 'dsn.txt', 'r');

    if ($file) {
        while (($line = fgets($file)) !== false) {
            if (str_starts_with($line, 'S10.G00.00.001')) {
                $website = $this->getStringBetween($line, "'", "'");
            }
            if (str_starts_with($line, 'S10.G00.00.002')) {
                $companyName = $this->getStringBetween($line, "'", "'");
            }
            if (str_starts_with($line, 'S10.G00.00.003')) {
                $version = $this->getStringBetween($line, "'", "'");
            }
            .......
       }
            
       fclose($file);

    }

But for x repeated blocks , how can I extract each blocks which starts with line "S21.G00.30.001" and put each block inside an array, like so I can easly read the values of each line.

CodePudding user response:

You can parse the file line by line. You will have the current block as an array variable, populate it as rows are parsed and, when a new block start just add the previous block to the final result array.

The following code uses basic functions (and not $this-> calls, as you have in the question). You can update the code as you wish.

<?php
// the file was placed on my server for testing
$file = fopen('test.txt','r');
// this will contain the final result
$result = [];
// currentBlock is null at first
$currentBlock = null;
while (($line = fgets($file)) !== false) {
    // extracting the line code
    $lineCode = substr($line, 0, 14);
    // checking if the row contains a value, between two '
    $rowComponents = explode("'", $line);
    if (count($rowComponents) < 2) {
        // the row is not formatted ok
        continue;
    }
    $value = $rowComponents[1];
    switch ($lineCode) {
        case 'S10.G00.00.001':
            $website = $value;
            break;
        case 'S10.G00.00.002':
            $companyName = $value;
            break;
        case 'S10.G00.00.003':
            $version = $value;
            break;
        case 'S21.G00.30.001':
            // starting a new entry
            if ($currentBlock !== null) {
                // we already have a block being parsed
                // so we added it to the final result
                $result[] = $currentBlock;
            }
            // starting the current block as an empty array
            $currentBlock = [];
            $currentBlock['property1'] = $value;
            break;
        case 'S21.G00.30.002':
            $currentBlock ['property2'] = $value;
            break;
        case 'S21.G00.30.004':
            $currentBlock ['property4'] = $value;
            break;
    }
}
// adding the last entry into the final result
// only if the block exists
if ($currentBlock !== null) {
    $result[] = $currentBlock;
}
fclose($file);
// output the result for debugging
// you also have the $website, $companyName, $version parameters populated
var_dump($result);

?>

After the scrips runs, I have the following output, from the var_dump call:

array(3) {
  [0]=>
  array(3) {
    ["property1"]=>
    string(12) "employee one"
    ["property2"]=>
    string(4) "AAAA"
    ["property4"]=>
    string(4) "BBBB"
  }
  [1]=>
  array(3) {
    ["property1"]=>
    string(10) "employee 2"
    ["property2"]=>
    string(4) "CCCC"
    ["property4"]=>
    string(4) "DDDD"
  }
  [2]=>
  array(3) {
    ["property1"]=>
    string(10) "employee 3"
    ["property2"]=>
    string(4) "EEEE"
    ["property4"]=>
    string(4) "FFFF"
  }
}

CodePudding user response:

Personally, I would first get all the data with file_get_contents(). Then use preg_match_all() to extract what I need. You can adapt this solution to use fopen(), fgets(), and preg_match() on your own.

A good regex will capture exactly what you need, then it's up to you to organize the data according to your logic. Here is an example that can handle multiple "id" strings:

<?php

//$data = file_get_contents($this->getParameter('dsn_txt_folder') . 'dsn.txt');
$data = "
S10.G00.00.001,'www.mywebsite.com' //fixed line
S10.G00.00.002,'Company name' // fixed line
S10.G00.00.003,'v2.01' //fixed line
S10.G00.00.005,'02' //fixed line
.............

S21.G00.30.001,'employee one' //block 1
S21.G00.30.002,'AAAA'
S21.G00.30.004,'BBBB' sx
S21.G00.30.005,'02'
S21.G00.30.006,'16021993'
S21.G00.30.007,'4'
S21.G00.30.008,'A Renasca'
S21.G00.30.001,'employee 2' //block 2
S21.G00.30.002,'CCCC'
S21.G00.30.004,'DDDD' 
S21.G00.30.005,'02'
S21.G00.30.001,'employee 3' //block 3
S21.G00.30.002,'EEEE'
S21.G00.30.004,'FFFF' 
S21.G00.30.005,'02'
S21.G00.30.007,'4'
S21.G00.30.008,'some text 3'
";
$extracted = [];
$ids = [
    'S21.G00.30.',
    //'S10.G00.00.',
];
foreach($ids as $id){
  $regex = "/^".implode('\\.', explode('.', $id))."(\d{3}),'(.*)'/m"; // "/^S21\.G00\.30\.(\d{3}),'(.*)'/m"
  $matches = [];
  $block = 0;
  preg_match_all($regex, $data, $matches);
  foreach($matches[0] as $i => $full){
    if('001' === $matches[1][$i]) 
        $block;
    $extracted[$id][$block][$matches[1][$i]] = $matches[2][$i];
  }
}

var_export($extracted);

This will yield the following:

array (
  'S21.G00.30.' => 
  array (
    1 => 
    array (
      '001' => 'employee one',
      '002' => 'AAAA',
      '004' => 'BBBB',
      '005' => '02',
      '006' => '16021993',
      '007' => '4',
      '008' => 'A Renasca',
    ),
    2 => 
    array (
      '001' => 'employee 2',
      '002' => 'CCCC',
      '004' => 'DDDD',
      '005' => '02',
    ),
    3 => 
    array (
      '001' => 'employee 3',
      '002' => 'EEEE',
      '004' => 'FFFF',
      '005' => '02',
      '007' => '4',
      '008' => 'some text 3',
    ),
  ),
)

See it in action here: https://onlinephp.io/c/fc256

  • Related