2 years ago
#51183

Kıvanç Yüksel
Is there an alternative to DVC pipelines to create a DAG which is also aware of inputs/outputs to nodes to cache results?
I recently started to use DVC pipelines to create DAG in my application. I work on Machine Learning projects, and I need to experiment a lot with different nodes of my system. For example:
Data preprocessing -> feature extraction -> model training -> model evaluation
Each node produces an output, and the output of each node is used in another node. What DVC allows me to do is to create a pipeline in which I can specify dependencies between nodes. I also use .yaml
files to configure parameters of my application, and you can also specify these parameters as dependencies for different nodes. So, whenever a dependency changes between nodes (it can be either configuration parameters or inputs/outputs specified), DVC is able to detect this, and run the necessary parts of the pipeline. If a dependency hasn't changed for a particular node, DVC can use its cache to skip that step. This is really useful for me, since some nodes take really long time to execute, and they don't always need to be ran (if their dependencies hasn't changed).
I also started to use hydra to manage my config files, and to be honest, DVC doesn't work well with hydra. It expects a static config to specify parameter dependencies, and with hydra it is a bit tricky to do, and complicate things.
My question is: is there any alternative to DVC Pipelines which also goes well with hydra?
pipeline
directed-acyclic-graphs
dvc
hydra-core
0 Answers
Your Answer