DataLoader pytorch num_workers

2 years ago

#19766

FluidMechanics Potential Flows

I'm currently looking at this tutorial: https://deeplizard.com/learn/video/kWVgvsejXsE about what is the ideal value for num_workers (optional attribute of the DataLoader class). If I understand well, if you have 2 CPUs, one can be used to load the data and the other to perform the stuff (adjusting the weights, backpropagating and stuff). However, why do they limit their tests to 1 epoch, i feel like they should do their tests to several epochs because for each epoch we need to load new batches, etc ...

I feel like I might be missing something on how this exactly works. Any answers is welcome, I'm quite new so don't hesitate to give details that may seem too obvious on your end.

I just launched my own testings to see if their results change when i put 500 epochs for my neural network. I've launched 20 tasks with 2 cpus and 20 tasks with 1 cpu and i varied the number of num_workers, and leaving everything else identical (batch_size included). The ideal num_worker should be 0 for the tasks with 1 cpu and 2 for the tasks with 2 cpus right ? What's weird is I don't have the expected behaviour:

On average though, 2 cpus is quicker than 1 cpu so ther's that at least.

My batch_size is 32, should I vary batch_size too ?. Their result was that it was the same at every batch_size.

python

pytorch

dataloader

0 Answers

Your Answer

Posts

Questions

Blogs