The code below is an attempt at a minimal reproducible example, it relies on the folders (folder_source and folder_target) and files (file_id1.csv, fileid2.csv). The code loads a csv from a directory, changes the name, and saves it to another directory.
The code works fine. I would like to know if there is a way of avoiding the nested for loop.
Thank you!
list_of_file_paths =['C:\\Users\\user\\Desktop\\folder_source\\file_id1.csv','C:\\Users\\user\\Desktop\\folder_source\\file_id2.csv']
list_of_variables =['heat','patience','charmander']
target_path=r'C:\\Users\\user\\Desktop\\folder_target\\'
for filepath_load in list_of_file_paths:
for variable in list_of_variables:
df_loaded = pd.read_csv(filepath_load) #grab one of the csv in the source folder
id_number=filepath_load.split(".")[0].split("_")[-1] #extracts the name of the id from the csv file
df_loaded.to_csv(target_path id_number '_' variable '.csv',index=False) #rename the folder and saves into another folder
CodePudding user response:
You're looking for Cartesian product of 2 lists I guess?
from itertools import product
for (filepath_load, variable) in (product(list_of_file_paths, list_of_variables)):
df_loaded = pd.read_csv(filepath_load)
id_number=filepath_load.split(".")[0].split("_")[-1]
df_loaded.to_csv(target_path id_number '_' variable '.csv',index=False)
But as Roland Smith says, you have some redundancy here. I'd prefer his code, which has two loops but the minimal amount of I/O and computation.
CodePudding user response:
If you really want to save each file into three identical copies with a different name, there is really no alternative.
Although I would move the inner loop down, removing redundant file reads.
for filepath_load in list_of_file_paths:
df_loaded = pd.read_csv(filepath_load)
id_number=filepath_load.split(".")[0].split("_")[-1]
for variable in list_of_variables:
df_loaded.to_csv(target_path id_number '_' variable '.csv',index=False)
Adittionally, consider using shutil.copy
since the source file is not modified:
import shutil
for filepath_load in list_of_file_paths:
df_loaded = pd.read_csv(filepath_load)
id_number=filepath_load.split(".")[0].split("_")[-1]
for variable in list_of_variables:
shutil.copy(filepath_load, target_path id_number '_' variable '.csv')
That would employ the operating system's buffer cache, at least for the second and third write.