Home > database >  How to read a very large csv file into a pandas dataframe as quickly as possible?
How to read a very large csv file into a pandas dataframe as quickly as possible?

Time:09-15

I am reading a very large csv file (~1 million rows) into a pandas dataframe using pd.read_csv() function according to the following options: (note that the seconds are also inside timestamp column but not shown in here due to exact copy and paste from csv file)

enter image description here

pd.read_csv(file,
            index_col='Timestamp',
            engine='c',
            na_filter=False,
            parse_dates=['Timestamp'],
            infer_datetime_format=True,
            low_memory=True)

My question is how to speed up the reading as it is taking forever to read the file?

CodePudding user response:

dask appears quicker at reading .csv files then the typical pandas.dataframe although the syntax remains similar.

The answer to this question appears to help using dask:

How to speed up loading data using pandas?

I use this method when working with .csv's also when performance is an issue.

  • Related