If a file fed into pandas read_csv is too large, will it raise an exception? What I'm afraid of is that it will just read what it can, say the first 1,000,000 rows and proceed as if there was no problem.
Does there exist situations in which pandas will fail to read all records in a file but also fail to raise an exception (print errors).
CodePudding user response:
I had issues with pandas once where I tried to open a very large dataset and my kernel crashed. I eventually used PySpark. It is not hard to use and you can easily port between PySpark and Pandas.
CodePudding user response:
If you have large dataset, and if you want to read it manytimes, I recommend you to use .pkl file
Or you can use try exception method.
However, if you still want to use csv file, you can visit this link and find solution How do I read a large csv file with pandas?
CodePudding user response:
I'd recommend using dask
which is a high-level library that supports parallel computing,
You can easily import all your data but it won't be loaded in your memory
import pandas as pd
import dask.dataframe as dd
df = dd.read_csv('data.csv')
df
and from there , you can compute only selected columns/rows you are interested in:
df_selected = df[columns].loc[indices_to_select]
df_selected.compute()