I am currently in the process of getting the data from my stakeholder where he has a database from which he is going to extract as a csv file.
From there he is going to upload in shared drive and I am going to pick up the data probably download the data and use that a source locally to import in pandas dataframe.
The approximate size will be 40 million rows, I was wondering if the data can be exported as a single csv file from SQL database and that csv can be used as a source for python dataframe or should it be in chunks as I am not sure what the row limitation of csv file is.
I don't think so ram and processing should be an issue at this time.
Your help is much appreciated. Cheers!
CodePudding user response:
If you can't connect directly to the database, you might need the .db file. I'm not sure a csv will even be able to handle more than a million or so rows.
CodePudding user response:
as I am not sure what the row limitation of csv file is.
There is not such limit inherent for CSV format, if you understood CSV as format defined by RFC4180 which stipulates that CSV file is
file = [header CRLF] record *(CRLF record) [CRLF]
where [
...]
denote optional part, CRLF denote carriagereturn-linefeed (\r\n
) and *(
...)
denote part repeated zero or more times.