Let's say I have two lists of files with similar names like so:
images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2', 'mask_im_3']
How would I be able to efficiently remove elements that aren't matching? I want to get the following:
images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2']
I've tried doing the following:
setA = set([x[-4:] for x in images])
setB = set([x[-4:] for x in masks])
matches = setA.union(setB)
elems = list(matches)
for elem in elems:
result = [x for x in images if x.endswith(elem)]
But this is rather naïve and slow as I need to iterate through a list of ~100k elements. Any idea how I can effectively implement this?
CodePudding user response:
First of all, since you want the common endings, you should use intersection, not union:
matches = setA.intersection(setB)
Then matches
is already a set, so instead of converting it to a list and loop over it, loop over images
and masks
and check for set membership.
imgres = [x for x in images if x[-4:] in matches]
mskres = [x for x in masks if x[-4:] in matches]
CodePudding user response:
Your solution is basically as good as it gets, you can improve it to just a single run through though if you store an intermediate map image_map
# store dict of mapping to original name
image_map = {x[-4:]: x for x in images}
# store all our matches here
matches = []
# loop through your other file names
for mask in masks:
# if this then we have a match!
if mask[-4:] in image_map:
# save the mask
matches.append(mask)
# get the original image name
matches.append(image_map[mask[-4:]])