Home > Back-end >  File filtering using matlab regular expressions
File filtering using matlab regular expressions

Time:10-29

I need to use regular expressions to filter through a directory of nii files and create a cell array of files that match the requirements. However, I am new to matlab and there are just too many requirements for me to work out the regular expression needed.

The desired cell array is displayed in the attached image (but note that the nii files have 20 frames so I want the script to loop through 20 times):

files

Here are the files in the directory (along with other ones that I am not interested in):

Files in directory:

directory

In the past, I have been able to create these arrays by just using dir and filtering using ‘*.nii, however, the directory I am working with has loads of different nii, files so using it here would not be specific enough.

As you can see from the picture, the pattern I am looking for is ‘ica_sub’ 3 digits 'component_ica_s' 1 digit '.nii,' a final digit.

As explained above, the nii files have 20 frames so the 'final digit' in the expression will need to be a variable containing the numbers 1 to 20 (which I will loop through).

I am just really confused about how to combine all of this together. If anyone can help me out I would appreciate it so much. Gerard

I have tried to look up examples of regex on matlab but I am struggling to understand the syntax

CodePudding user response:

You can get all .nii files in a directory using one of these:

files = dir( fullfile( 'myDirectory', '*.nii' ) );    % single directory
files = dir( fullfile( 'myDirectory', '**\*.nii' ) ); % recursive nested directories

I like to convert this struct to a table for slightly easier manipulation

files = struct2table( files, 'AsArray', true );

Then you can use regex to filter for this pattern:

‘ica__sub’ 3 digits 'component_ica_s' 1 digit '.nii'

I took the liberty of adding a 2nd underscore to ica__sub because that's what your screenshot shows.

The regular expression filtering uses \d{n} to match n digits, and is prefixed with ^ to assert the name starts with ica_... rather than just containing this string.

% Get the matches, will be empty for each unmatched name
matchedParts = regexp( files.name, '^ica__sub\d{3}component_ica_s\d{1}\.nii', 'match', 'once' );
% Get which rows are a match
idx = ~cellfun( @isempty, matchedParts );
% Filter the directory results
files = files( idx, : );

Now you can loop over these files and do whatever you want with them

for ii = 1:height(files)
    filename = fullfile( files.folder{ii}, files.name{ii} );
    % do something with filename...
    % Maybe loop 1 to 20 for frames within each file
    for kk = 1:20
        file_frame = [filename, ',', num2str(kk)]; 
        % something else...
    end
end
  • Related