2 years ago

#33331

test-img

Omar AlSuwaidi

Loading & splitting same training data but getting different results

So I'm trying to manually split my training data into separate batches such that I can easily access them via indexing, and not relying on DataLoader to split them up for me, since that way I won't be able to access the individual batches by indexing. So I attempted the following:

train_data = datasets.ANY(root='data', transform=T_train, download=True)
BS = 200
num_batches = len(train_data) // BS
sequence = list(range(len(train_data)))
np.random.shuffle(sequence)  # To shuffle the training data
subsets = [Subset(train_data, sequence[i * BS: (i + 1) * BS]) for i in range(num_batches)]
train_loader = [DataLoader(sub, batch_size=BS) for sub in subsets]  # Create multiple batches, each with BS number of samples

Which works during training just fine.

However, when I attempted another way to manually split the training data I got different end results, even with all the same parameters and the following settings ensured:

device = torch.device('cuda')
torch.manual_seed(0)
np.random.seed(0)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.cuda.empty_cache()

I only split the training data the following way this time:

train_data = list(datasets.ANY(root='data', transform=T_train, download=True))  # Cast into a list
BS = 200
num_batches = len(train_data) // BS
np.random.shuffle(train_data)  # To shuffle the training data
train_loader = [DataLoader(train_data[i*BS: (i+1)*BS], batch_size=BS) for i in range(num_batches)]

But this gives me different results than the first approach, even though (I believe) that both approaches are identical in manually splitting the training data into batches. I even tried not shuffling at all and loading the data just as it is, but I still got different results (85.2% v.s 81.98% accuracy). I even manually checked that the loaded images from the batches match; and are the same using both methods.

The training scheme used in both ways:

for e in trange(epochs):
    for loader in train_loader:
        for x, y in loader:
            x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
            loss = F.cross_entropy(m1(x), y)
            loss.backward()
            optim.step()
            scheduler.step()
            optim.zero_grad()

Can somebody please explain to me why these differences arise (and if there's a better way)?

UPDATE:

T_train transformation contains some random transformations (H_flip, crop) and when using it along with the first train_loader the time taken during training was: 24.79s/it, while the second train_loader took: 10.88s/it (even though both have the exact same number of parameters updates/steps). So I decided to remove the random transformations from T_train; then the time taken using the first train_loader was: 16.99s/it, while the second train_loader took: 10.87s/it. So somehow, the second train_loader still took the same time (with or without the random transformations). Thus, I decided to visualize the image outputs from the second train_loader to make sure that the transformations were applied, and indeed they were! So this is really confusing and I'm not quite why they're giving different results.

python

pytorch

dataloader

pytorch-dataloader

0 Answers

Your Answer

Accepted video resources