I have a database which contains all the data from a CSV that I downloaded from a botanic website, and I want to it always updated based on the version of this CSV. My problem is that the CSV is extremely huge, It contains at least 1 million records which takes on average an entire hour to save everything. Any ideas on how can I perform this update without the need to rewrite the entire thing? TYSM!
Some extra info:
- It's a nodeJs project, and I'm using Prisma for the ORM
- It's actually a txt file which I parse as a CSV since all the data is separated by tabs
- I'm using posgresql
- Currently I'm dropping the table and saving everything from the ground up
CodePudding user response:
You don't mention how you load the data into postgres, but you could use the COPY command, something like this:
COPY mytable FROM '/tmp/myfile.csv' (FORMAT csv, header TRUE);
Should be faster than using the ORM
https://www.postgresql.org/docs/current/sql-copy.html
Other option is to use file_fdw, if you just need to query the data, no updates
https://www.postgresql.org/docs/current/file-fdw.html
CodePudding user response:
You would need to essentially find differences between two csv versions and find the records which have changed and just update those records.
You could use any text/csv comparator npm library to find the difference, and once you get the records which has updates you could just initiate an update query for them.