Home > Blockchain >  Python faster import csv to dict
Python faster import csv to dict

Time:06-13

I want to import multiple csv files into a dictionary. Unfortunaetly my solution is very slow. How could I optimize that code?

Thank you in advance! :)

dats = os.listdir(path) #file_names
dat_names = [i.split(sep = "_")[0] for i in dats ] #should be key in dict
PFC_Dict = {}
i = 0
while i < len(dats):
    PFC_Dict[dat_names[i]] = pd.read_csv(str(path   str(dats[i])), sep =";", parse_dates= True, index_col=(0), names = ["Preis"], decimal =",", dayfirst  =True ).resample("15min").ffill()
    i =  1 

Edit: Additional information:

  • Number of import files: ~10 files.
  • Size of files: ~ 1 MB, Shape of CSV: (160000,1)
  • Context:

Result of the analysis should be a dataframe in following form:

  • index presenting the file name
  • columns representing different scenarions of the calculation (different parameters)

The files consist of a datetime index & corresponding prices. The files have different start dates and diffent prices, since these are forecasts.

I will merge these dataframes on different data depending on the start dates of these. With seperate dataframes for each file I can find out there start date easily, since its index[0]. On the other hand If I would have one dataframe for all files, I thought its not that easy to find the start dates for each file.

CodePudding user response:

Reading csv is a slow process because a csv is meant to be readable by humans. The most efficient file format is .feather. Luckily, Pandas has built-in support for feather files:

.read_csv() --> .read_feather()

.to_csv() --> .to_feather()

Run a script once that converts all the .csv files to .feather. To do this, loop through all the csv files and read them with pd.read_csv(). Next export it using df.to_feather().

When you run your code now, it should read in the .feather files much faster. For me, I had a data file that took 30 seconds to read as a csv and 1 to 2 seconds to read as a feather file.

CodePudding user response:

Problem wasnt the speed of the code or the size of the data.. Just the code was wrong. I looped through the list of import files as long i < len(import_list) with following code:

i =  1

Obvisously it should be i = 1. Otherwise it looped endless through the list.

Thank you for all your replies!

  • Related