Home > Back-end >  Save training and test set .csv files with custom names
Save training and test set .csv files with custom names

Time:09-21

I have ten datasets which I have split into training and test sets:

names=["df1","df2","df3", "df4", "df5", "df6", "df7", "df8", "df9", "df10"]
dataset_list = []
for i in range(len(names)):
    datasets = pd.read_csv(f"{fulldatafolder}/" names[i] "_Full_Dataset.csv")
    dataset_list.append(datasets)
training_set_list=list()
test_set_list=list()
for dataset in dataset_list:
    training_sets, test_sets=np.split(dataset, [int(.90*len(dataset))])
    training_set_list.append(training_sets)
    test_set_list.append(test_sets)

However, if I try to save all these datasets to their respective folders as follows:

for names, dataset in enumerate(training_set_list):
    dataset.to_csv(f"{trainingfolder}/{format(names)}_Training_Set.csv", index=False, sep=",")
for names, dataset in enumerate(test_set_list):
    dataset.to_csv(f"{testfolder}/{format(names)}_Test_Set.csv", index=False, sep=",")

I get the .csv files with a number (0,...,9) in front of "_Training_Set.csv" and "_Test_Set.csv" instead of their names "df1",...,"df10" specified in the list names. How can I fix this?

CodePudding user response:

When using enumerate it returns counter and value. Basically, what you did here is introducing a new, local names variable (instead of the previous one, with the list of names) with a counter from the enumerate. I guess you thought this would loop through the original variable.

If you want to loop through both lists, you could use zip():

for name, dataset in zip(names, training_set_list):
    dataset.to_csv(f"{trainingfolder}/{format(name)}_Training_Set.csv", index=False, sep=",")
for name, dataset in zip(names, test_set_list):
    dataset.to_csv(f"{testfolder}/{format(name)}_Test_Set.csv", index=False, sep=",")

In addition to that, I would change your first loop from:

for i in range(len(names)):
    datasets = pd.read_csv(f"{fulldatafolder}/" names[i] "_Full_Dataset.csv")

to:

for name in names:
    datasets = pd.read_csv(f"{fulldatafolder}/{name}_Full_Dataset.csv")

As you can see, no need to create a range, when you can directly loop through the list. Secondly, since you already use "f" to format a string, better to use that variable directly in the string, instead of concatenating the string with the " " sign.

  • Related