What is the best method to update small records from a big database?-CodePudding

I have a database which contains all the data from a CSV that I downloaded from a botanic website, and I want to it always updated based on the version of this CSV. My problem is that the CSV is extremely huge, It contains at least 1 million records which takes on average an entire hour to save everything. Any ideas on how can I perform this update without the need to rewrite the entire thing? TYSM!

Some extra info:

It's a nodeJs project, and I'm using Prisma for the ORM
It's actually a txt file which I parse as a CSV since all the data is separated by tabs
I'm using posgresql
Currently I'm dropping the table and saving everything from the ground up

CodePudding user response：

You don't mention how you load the data into postgres, but you could use the COPY command, something like this:

COPY mytable FROM '/tmp/myfile.csv' (FORMAT csv, header TRUE);

Should be faster than using the ORM

https://www.postgresql.org/docs/current/sql-copy.html

Other option is to use file_fdw, if you just need to query the data, no updates

https://www.postgresql.org/docs/current/file-fdw.html

CodePudding user response：

You would need to essentially find differences between two csv versions and find the records which have changed and just update those records.

You could use any text/csv comparator npm library to find the difference, and once you get the records which has updates you could just initiate an update query for them.