Home > OS >  Compress PHP array of characters to shorter array with a-z, A-Z, 0-9
Compress PHP array of characters to shorter array with a-z, A-Z, 0-9

Time:06-13

I have a PHP array like this:

array('A','B','C','F','a','b','c','d','e','f','h','i','j','k','l','o','q','!','?','0','2','3','4','9')

How can I convert this to:

array('A-C','F','a-f','h-l','o','q','!','?','0-4','9')

There will only be standard ASCII characters - nothing fancy. Every character can only appear once.

CodePudding user response:

This is tricky because you have to keep track of whether the char you are up to is in a "range" (e.g. A-Z), and if so whether it is the next sequential char in that range or whether it is a new character in the same range (which would result in a new range being started).

Here is code that produces the desired result, and also prints the tests being done at every step to determine whether to output a single character (e.g. '?') or a range of characters (e.g. 'a-f').

// Starting array:
$a = array('A','B','C','F','a','b','c','d','e','f','h','i','j','k','l','o','q','!','?','0','2','3','4','9');

// Determine range that $ch is in and return an array with the start and end chars of that range, or an array of empty strings if no range.
function r( $ch ) {
    if (
        $ch >= 'A' && $ch <= 'Z'
    ) {
        return ['A', 'Z'];
    }

    if (
        $ch >= 'a' && $ch <= 'z'
    ) {
        return ['a', 'z'];
    }

    if (
        $ch >= '0' && $ch <= '9'
    ) {
        return ['0', '9'];
    }

    return ['', ''];
}

$b = []; // Destination array

// Determine starting range (if any):
$current_range = $ch_range = r($a[0]);
if ( strlen( $current_range[0] ) ) {
    $range_start = $a[0];
    echo "Current range is " . $current_range[0] . '-' . $current_range[1] . " (starting at $range_start)\n";
} else {
    echo "Not currently in a range\n";
}

echo "Starting loop\n";

for ( $i=1; $i < sizeof($a); $i   ) {
    $ch = $a[$i];
    echo "Got $ch\n";
    $ch_range = r($ch); // Range that current char is in

    // Are we currently in a range?
    if ( strlen( $current_range[0] ) ) {
        // Are we still in a range?
        if ( strlen( $ch_range[0] ) ) { // Yes
            // Are we currently in the same range?
            if ( $ch_range[0] === $current_range[0] ) {
                // Are we at the next sequential char?
                if (
                    ord($a[$i-1]) 1 == ord($ch) // We are at the next char in the range
                ) {
                    echo "We are at the next character in the range " . $current_range[0] . '-' . $current_range[1] . " (starting at $range_start) so doing nothing\n";
                } else {
                    echo "We are at a non-sequential character in the current range so adding [" . $range_start . '-' . $a[$i-1] . "] and starting a new range " . $ch_range[0] . '-' . $ch_range[1] . " (starting at $ch)\n";
                    $b[] = ($range_start == $a[$i-1]) ? $range_start : ( $range_start . '-' . $a[$i-1] );
                    $current_range = $ch_range;
                    $range_start = $ch;
                }
            } else {
                echo "Detected switch from range " . $current_range[0] . '-' . $current_range[1] . " to " . $ch_range[0] . '-' . $ch_range[1] . " so adding [" . $range_start . '-' . $a[$i-1] . "]\n";
                $b[] = ($range_start == $a[$i-1]) ? $range_start : ( $range_start . '-' . $a[$i-1] );
                $current_range = $ch_range;
                $range_start = $ch;
            }
        } else {
            // We were in a range, but no longer
            echo "We were in range " . $current_range[0] . '-' . $current_range[1] . " (starting with $range_start) but no longer, so adding range [$range_start-" . $a[$i-1] . "]\n";
            $b[] = ($range_start == $a[$i-1]) ? $range_start : ( $range_start . '-' . $a[$i-1] );
            $current_range = $ch_range;
            $range_start = $ch;
            
            // If not starting a range, add this ch:
            if ( ! strlen( $ch_range[0] ) ) {
                echo "Not starting a new range with '$ch' so adding [$ch]\n";
                $b[] = $ch;
                $range_start = '';
            }
        }
    } else { // We are not in a range
        // Are we starting a new range?
        if ( strlen( $ch_range[0] ) ) {
            echo "We are not in a range but starting a new one [" . $ch_range[0] . '-' . $ch_range[1] . "\n";
            $current_range = $ch_range;
            $range_start = $ch;
        } else {
            echo "We are not in a range and not starting one, so adding $ch\n";
            $b[] = $ch;
            $current_range = [];
            $range_start = '';
        }
    }
}

// Did we finish in the middle of a range?
if ( strlen( $range_start ) ) {
    echo "Finished in the middle of a range so adding [$range_start-" . $a[$i-1] . "]\n";
    $b[] = ($range_start == $a[$i-1]) ? $range_start : ( $range_start . '-' . $a[$i-1] );
}

print_r($b);

// Intended output: array('A-C','F','a-f','h-l','o','q','!','?','0','2-4','9')

There is one line of code which is repeated four times, and which could probably be refactored into a separate function:

$b[] = ($range_start == $a[$i-1]) ? $range_start : ( $range_start . '-' . $a[$i-1] );

CodePudding user response:

Here is a technique that is relying on ord() and the ascii table integer values for cgaracters to determine consecutive ranges. I do realize that your characters, but it suffices for your sample data. The rest of the algorithm can remain the same if you need to tweak the consecutive check logic.

In a nutshell, you check if the current range is empty, or check if the character immediately follows the last character order-wise, otherwise push the imploded range and start again.

Code: (Demo)

$chars = ['A','B','C','F','a','b','c','d','e','f','h','i','j','k','l','o','q','!','?','0','2','3','4','9'];

$result = [];
$range = [null];
foreach ($chars as $char) {
    if ($range[0] === null) {
        $range = [$char];
    } elseif (ord($range[1] ?? $range[0]) === ord($char) - 1) {
        $range[1] = $char;
    } else {
        $result[] = implode('-', $range);
        $range = [$char];
    }
}
if ($range[0] !== null) {
    $result[] = implode('-', $range);
}
var_export($result);

Output:

array (
  0 => 'A-C',
  1 => 'F',
  2 => 'a-f',
  3 => 'h-l',
  4 => 'o',
  5 => 'q',
  6 => '!',
  7 => '?',
  8 => '0',
  9 => '2-4',
  10 => '9',
)
  • Related