2 years ago
#33331

Omar AlSuwaidi
Loading & splitting same training data but getting different results
So I'm trying to manually split my training data into separate batches such that I can easily access them via indexing, and not relying on DataLoader
to split them up for me, since that way I won't be able to access the individual batches by indexing. So I attempted the following:
train_data = datasets.ANY(root='data', transform=T_train, download=True)
BS = 200
num_batches = len(train_data) // BS
sequence = list(range(len(train_data)))
np.random.shuffle(sequence) # To shuffle the training data
subsets = [Subset(train_data, sequence[i * BS: (i + 1) * BS]) for i in range(num_batches)]
train_loader = [DataLoader(sub, batch_size=BS) for sub in subsets] # Create multiple batches, each with BS number of samples
Which works during training just fine.
However, when I attempted another way to manually split the training data I got different end results, even with all the same parameters and the following settings ensured:
device = torch.device('cuda')
torch.manual_seed(0)
np.random.seed(0)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.cuda.empty_cache()
I only split the training data the following way this time:
train_data = list(datasets.ANY(root='data', transform=T_train, download=True)) # Cast into a list
BS = 200
num_batches = len(train_data) // BS
np.random.shuffle(train_data) # To shuffle the training data
train_loader = [DataLoader(train_data[i*BS: (i+1)*BS], batch_size=BS) for i in range(num_batches)]
But this gives me different results than the first approach, even though (I believe) that both approaches are identical in manually splitting the training data into batches. I even tried not shuffling at all and loading the data just as it is, but I still got different results (85.2% v.s 81.98% accuracy). I even manually checked that the loaded images from the batches match; and are the same using both methods.
The training scheme used in both ways:
for e in trange(epochs):
for loader in train_loader:
for x, y in loader:
x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
loss = F.cross_entropy(m1(x), y)
loss.backward()
optim.step()
scheduler.step()
optim.zero_grad()
Can somebody please explain to me why these differences arise (and if there's a better way)?
UPDATE:
T_train
transformation contains some random transformations (H_flip, crop) and when using it along with the first train_loader
the time taken during training was: 24.79s/it, while the second train_loader
took: 10.88s/it (even though both have the exact same number of parameters updates/steps). So I decided to remove the random transformations from T_train
; then the time taken using the first train_loader
was: 16.99s/it, while the second train_loader
took: 10.87s/it. So somehow, the second train_loader
still took the same time (with or without the random transformations). Thus, I decided to visualize the image outputs from the second train_loader
to make sure that the transformations were applied, and indeed they were! So this is really confusing and I'm not quite why they're giving different results.
python
pytorch
dataloader
pytorch-dataloader
0 Answers
Your Answer