Home > Software design >  Extracting numbers using Regex in Matlab
Extracting numbers using Regex in Matlab

Time:11-01

I would like to extract integers from strings from a cell array in Matlab. Each string contains 1 or 2 integers formatted as shown below. Each number can be one or two digits. I would like to convert each string to a 1x2 array. If there is only one number in the string, the second column should be -1. If there are two numbers then the first entry should be the first number, and the second entry should be the second number.

'[1, 2]'
'[3]'
'[10, 3]'
'[1, 12]'
'[11, 12]'

Thank you very much!

I have tried a few different methods that did not work out. I think that I need to use regex and am having difficulty finding the proper expression.

CodePudding user response:

You can use str2num to convert well formatted chars (which you appear to have) to the correct arrays/scalars. Then simply pad from the end 1 element to the 2nd element (note this is nothing in the case there's already two elements) with the value -1.

This is most clearly done in a small loop, see the comments for details:

% Set up the input
c = { ...
    '[1, 2]'
    '[3]'
    '[10, 3]'
    '[1, 12]'
    '[11, 12]'
    };

n = cell(size(c));          % Initialise output
for ii = 1:numel(n)         % Loop over chars in 'c'
    n{ii} = str2num(c{ii}); % convert char to numeric array
    n{ii}(end 1:2) = -1;    % Extend (if needed) to 2 elements = -1
end

% (Optional) Convert from a cell to an Nx2 array
n = cell2mat(n);

If you really wanted to use regex, you could replace the loop part with something similar:

n = regexp( c, '\d{1,2}', 'match' ); % Match between one and two digits
for ii = 1:numel(n)
    n{ii} = str2double(n{ii});       % Convert cellstr of chars to arrays
    n{ii}(end 1:2) = -1;             % Pad to be at least 2 elements
end

But there are lots of ways to do this without touching regex, for example you could erase the square brackets, split on a comma, and pad with -1 according to whether or not there's a comma in each row. Wrap it all in a much harder to read (vs a loop) cellfun and ta-dah you get a one-liner:

n = cellfun( @(x) [str2double( strsplit( erase(x,{'[',']'}), ',' ) ), -1*ones(1,1-nnz(x==','))], c, 'uni', 0 );

I'd recommend one of the loops for ease of reading and debugging.

  • Related