Home > Enterprise >  Pair files using Python
Pair files using Python

Time:01-09

I have a folder with several .tif files that I would like to pair to perform some functions inside a for loop.

For example:

smp001_GFP.tif

smp001_mCherry.tif (this should be a pair)

smp002_GFP.tif

smp002_mCherry.tif (this another pair)

I would like the for loop to iterate over each pair and perform some functions. For example:

**for** pair **in** folder:
         img_GFP=cv2.imread(pair.__contains__("GFP"))
         img_mCherry=cv2.imread(pair.__contains__("mCherry"))

I've been told that I could pair the files using dictionaries, but which strategy will you recommend to do so?

Thanks!

CodePudding user response:

Some additional info/code would be helpful, but to give a general idea, what you can do is create a dictionary and then loop through your file names and create a new key for each numbered pair. Essentially:

pairs_dict = {}
for file_name in folder:
    # Get the prefix for the pair
    # assuming the filename format 'smp000_...'
    key = file_name.split('_')[0] # grabs 'smpXXX'
    # Then create a key in our dictionary for it. 
    pairs_dict[key] = []
...
for pair_prefix in list(pairs_dict.keys()):
    # 'get_file()' being whatever function the module 
    # you use has for grabbing files by name
    img_GFP = get_file(pair_prefix   '_GFP.tif')
    img_mCherry = get_file(pair_prefix   '_mCherry.tif')

CodePudding user response:

Nested dicts would work well. The outer dict keys 001, 002, etc... would map to inner dicts that hold {"GFP":filename, "mCherry:filename} items. If you use defaultdict for the outer dict, it will automatically create the inner dicts on first access. Use a regular expression to get the identifiers from the string.

import re
from collections import defaultdict
import os

tif_name_re = re.compile(r"smp(\d )_(GFP|mCherry)\.tif")
tif_map = defaultdict(dict)

for name in os.listdir("some/directory"):
    m = tif_name_re.match(name)
    if m:
        tif_map[m.group(1)][m.group(2)] = m.group(0)

for key,value in tif_map.items():
    print(key, value)

Output

001 {'GFP': 'smp001_GFP.tif', 'mCherry': 'smp001_mCherry.tif'}
002 {'GFP': 'smp002_GFP.tif', 'mCherry': 'smp002_mCherry.tif'}

CodePudding user response:

Here's a different view. Let's assume that the GFP and mCherry parts of the filenames are irrelevant but that the common part is actually that which precedes the underscore.

If that's the case then:

from glob import glob
from os.path import basename, join

DIRECTORY = './tifs' # directory contains the tif files
result = dict()
 
for filename in map(basename, glob(join(DIRECTORY, '*.tif'))):
    key, _ = filename.split('_')
    result.setdefault(key, []).append(filename)

print(result)

Output:

{'smp002': ['smp002_mCherry.tif', 'smp002_GFP.tif'], 'smp001': ['smp001_mCherry.tif', 'smp001_GFP.tif']}

This gives us a dictionary keyed on the preamble and the "pairs" as a list for each key

  • Related