Avoiding for loops when working with folders in Python-CodePudding

The code below is an attempt at a minimal reproducible example, it relies on the folders (folder_source and folder_target) and files (file_id1.csv, fileid2.csv). The code loads a csv from a directory, changes the name, and saves it to another directory.

The code works fine. I would like to know if there is a way of avoiding the nested for loop.

Thank you!


list_of_file_paths =['C:\\Users\\user\\Desktop\\folder_source\\file_id1.csv','C:\\Users\\user\\Desktop\\folder_source\\file_id2.csv']
list_of_variables =['heat','patience','charmander']

target_path=r'C:\\Users\\user\\Desktop\\folder_target\\'

for filepath_load in list_of_file_paths: 
    for variable in list_of_variables:
    
        df_loaded = pd.read_csv(filepath_load) #grab one of the csv in the source folder

        id_number=filepath_load.split(".")[0].split("_")[-1] #extracts the name of the id from the csv file
        
        df_loaded.to_csv(target_path id_number '_' variable '.csv',index=False) #rename the folder and saves into another folder

CodePudding user response：

You're looking for Cartesian product of 2 lists I guess?

from itertools import product

for (filepath_load, variable) in (product(list_of_file_paths, list_of_variables)):
    df_loaded = pd.read_csv(filepath_load)
    id_number=filepath_load.split(".")[0].split("_")[-1]
    df_loaded.to_csv(target_path id_number '_' variable '.csv',index=False)

But as Roland Smith says, you have some redundancy here. I'd prefer his code, which has two loops but the minimal amount of I/O and computation.

CodePudding user response：

If you really want to save each file into three identical copies with a different name, there is really no alternative.

Although I would move the inner loop down, removing redundant file reads.

for filepath_load in list_of_file_paths: 
    
    df_loaded = pd.read_csv(filepath_load) 

    id_number=filepath_load.split(".")[0].split("_")[-1] 
        
    for variable in list_of_variables:
        df_loaded.to_csv(target_path id_number '_' variable '.csv',index=False)

Adittionally, consider using shutil.copy since the source file is not modified:

import shutil

for filepath_load in list_of_file_paths: 
    
    df_loaded = pd.read_csv(filepath_load) 

    id_number=filepath_load.split(".")[0].split("_")[-1] 
        
    for variable in list_of_variables:
        shutil.copy(filepath_load, target_path id_number '_' variable '.csv')

That would employ the operating system's buffer cache, at least for the second and third write.