Home > Back-end >  Compare tables to ensure non regression in postgresql
Compare tables to ensure non regression in postgresql

Time:03-28

Here is my issue: I often need to compare the same postgresql tables (or views that depend on it) between some ETL code refactoring to check for non regressions in my developments.

Let's say I have an ETL code I want to refactor, which regularly uploads data in a table. Currently, once my modifs are done, I often download my data from postgresql as a .csv file as a first step, then empty it, fill it again using my refactored code, and download the data again. Then, I compare the .csv files using for instance Python in a Jupyter Notebook.

That does not seem like the way to go at all. That notably supposes I am the only one to use that table during the operation, and so many other things I can't list them all here.

Is there a better way to go ?

CodePudding user response:

It sounds to me like you have the correct approach. There's no magic to the CSV export operation: whatever tool you use runs a query and formats its resultset into the file. Any other before-and-after comparison operation would have to run the same query.

If you're doing this sort of regression test on an active database, it's probably wise to put some sort of distinctive tag on your test records, maybe prepend ETLTEST- to your customer names, so it's ETLTEST-John Bull. Then you can make your queries handle only your test records. And make sure you do something reliable for ORDER BY.

Juptyer seems a complex way to diff your csv files. Most operating systems have lightweight fast difftools.

  • Related