I want to import multiple csv files into a dictionary. Unfortunaetly my solution is very slow. How could I optimize that code?
Thank you in advance! :)
dats = os.listdir(path) #file_names
dat_names = [i.split(sep = "_")[0] for i in dats ] #should be key in dict
PFC_Dict = {}
i = 0
while i < len(dats):
PFC_Dict[dat_names[i]] = pd.read_csv(str(path str(dats[i])), sep =";", parse_dates= True, index_col=(0), names = ["Preis"], decimal =",", dayfirst =True ).resample("15min").ffill()
i = 1
Edit: Additional information:
- Number of import files: ~10 files.
- Size of files: ~ 1 MB, Shape of CSV: (160000,1)
- Context:
Result of the analysis should be a dataframe in following form:
- index presenting the file name
- columns representing different scenarions of the calculation (different parameters)
The files consist of a datetime index & corresponding prices. The files have different start dates and diffent prices, since these are forecasts.
I will merge these dataframes on different data depending on the start dates of these. With seperate dataframes for each file I can find out there start date easily, since its index[0]. On the other hand If I would have one dataframe for all files, I thought its not that easy to find the start dates for each file.
CodePudding user response:
Reading csv is a slow process because a csv is meant to be readable by humans. The most efficient file format is .feather
. Luckily, Pandas has built-in support for feather files:
.read_csv()
--> .read_feather()
.to_csv()
--> .to_feather()
Run a script once that converts all the .csv
files to .feather
. To do this, loop through all the csv files and read them with pd.read_csv()
. Next export it using df.to_feather()
.
When you run your code now, it should read in the .feather
files much faster. For me, I had a data file that took 30 seconds to read as a csv and 1 to 2 seconds to read as a feather file.
CodePudding user response:
Problem wasnt the speed of the code or the size of the data.. Just the code was wrong. I looped through the list of import files as long i < len(import_list) with following code:
i = 1
Obvisously it should be i = 1. Otherwise it looped endless through the list.
Thank you for all your replies!