Home > Net >  How to parse data that is being fetched from a text file?
How to parse data that is being fetched from a text file?

Time:12-15

I am fetching data from a text file where I need to match the substring to get the matched line. Once, I have that, I need to get the third 8 digit value in the line which comes after the delimiter "|". Basically, all the values have varying lengths and are separated by a delimiter "|". Except the first substring (id) which is of fixed length and has a fix starting and end position.

Text file data example:

    0123456|BHKAHHHHkk|12345678|JuiKKK121255
    9100450|HHkk|12348888|JuiKKK10000000021sdadad255
$file = 'file.txt';


// the following line prevents the browser from parsing this as HTML.
header('Content-Type: text/plain');

// get the file contents, assuming the file to be readable (and exist)
$contents = file_get_contents($file);
// escape special characters in the query
$txt = explode("\n",$contents);

$counter = 0;
foreach($txt as $key => $line){
    $subbedString = substr($line,2,6);

   // $searchfor = '123456';
    //echo strpos($subbedString,$searchfor); 
    if(strpos($subbedString,$searchfor) === 0){
        $matches[$key] = $searchfor;
        $matchesLine[$key] = substr($line,2,50);
          echo  "<p>" . $matchesLine[$key] . "</p>";
          
                  $counter  = 1;
                  if($counter==10) break;
         
    }

    

CodePudding user response:

  1. If you need to divide file's contents by line breaks, it's always better to use file function
  2. To divide line into parts with unknown length by a delimiter, use explode function.

Code:

$file = 'file.txt';
$txt = file($file);

$counter = 0;
foreach ($txt as $key => $line) {
    $line = \trim($line);
    $substrings = explode('|', $line);
    
    if (\count($substrings) === 0) {
        continue;
    }

    $searchFor = '123456';
    if (substr($substrings[0], 1) === $searchFor) {
        if (!isset($substrings[2]) {
            continue;
        }

        $matches[$key] = $searchFor;

        $matchesLine[$key] = $line;
        echo  "<p>" . $substrings[2] . "</p>";

        if (  $counter === 10) {
            break;
        }
    }
}

I also noticed that in your example there are 7-digit ids, while you were talking about 6 digits (and the $searchfor variable didn't match anything)

CodePudding user response:

Use

^(\d )\|[^|]*\|(\d{8})\|

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d                       digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \|                       '|'
--------------------------------------------------------------------------------
  [^|]*                    any character except: '|' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \|                       '|'
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    \d{8}                    digits (0-9) (8 times)
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  \|                       '|'

Sample code:

<?php

$re = '/^(\d )\|[^|]*\|(\d{8})\|/m';
$str = '0123456|BHKAHHHHkk|12345678|JuiKKK121255
9100450|HHkk|12348888|JuiKKK10000000021sdadad255';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);

Sample output:

array(2) {
  [0]=>
  array(3) {
    [0]=>
    string(28) "0123456|BHKAHHHHkk|12345678|"
    [1]=>
    string(7) "0123456"
    [2]=>
    string(8) "12345678"
  }
  [1]=>
  array(3) {
    [0]=>
    string(22) "9100450|HHkk|12348888|"
    [1]=>
    string(7) "9100450"
    [2]=>
    string(8) "12348888"
  }
}
  • Related