Background: In my projects I'm using GIT and DVC to keep track of versions:
- GIT - only for source codes
- DVC - for dataset, model objects and outputs
I'm testing different approaches in separate branches, i.e:
- random_forest
- neural_network_1
- ...
Typically as an output I'm keeping predictions in csv file with standarised name (i.e.: pred_test.csv). As a consequence in different branches I've different pred_test.csv files. The structure of the file is very simple, it contains two columns:
- ID
- Prediction
Question: What is the best way to merge those prediction files into single big file?
I would like to obtain a file with structure:
- ID
- Prediction_random_forest
- Prediction_neural_network_1
- Prediction_...
My main issue is how to access files with predictions which are in different branches?
CodePudding user response:
I would try to use dvc get
in this case:
dvc get -o random_forest_pred.csv --rev random_forest . pred_test.csv
It should bring the pred_test.csv
from the random_forest
branch.
Mind the
.
before thepred_test.csv
please, it's needed and it means that "use the current repo", sincedvc get
could also be used on other repos (e.g. GitHub URL)
Then I think you could use some CLI or write a script to join the files: