close

How to get mini-batches in pytorch in a clean and efficient way?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to get mini-batches in pytorch in a clean and efficient way in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How to get mini-batches in pytorch in a clean and efficient way?

  1. How to get mini-batches in pytorch in a clean and efficient way?

    assuming you have loaded the data from the directory, in train and test numpy arrays, you can inherit from torch.utils.data.Dataset class to create your dataset object

  2. get mini-batches in pytorch in a clean and efficient way

    assuming you have loaded the data from the directory, in train and test numpy arrays, you can inherit from torch.utils.data.Dataset class to create your dataset object

Method 1

If I’m understanding your code correctly, your get_batch2 function appears to be taking random mini-batches from your dataset without tracking which indices you’ve used already in an epoch. The issue with this implementation is that it likely will not make use of all of your data.

The way I usually do batching is creating a random permutation of all the possible vertices using torch.randperm(N) and loop through them in batches. For example:

n_epochs = 100 # or whatever
batch_size = 128 # or whatever

for epoch in range(n_epochs):

    # X is a torch Variable
    permutation = torch.randperm(X.size()[0])

    for i in range(0,X.size()[0], batch_size):
        optimizer.zero_grad()

        indices = permutation[i:i+batch_size]
        batch_x, batch_y = X[indices], Y[indices]

        # in case you wanted a semi-full example
        outputs = model.forward(batch_x)
        loss = lossfunction(outputs,batch_y)

        loss.backward()
        optimizer.step()

If you like to copy and paste, make sure you define your optimizer, model, and lossfunction somewhere before the start of the epoch loop.

With regards to your error, try using torch.from_numpy(np.random.randint(0,N,size=M)).long() instead of torch.LongTensor(np.random.randint(0,N,size=M)). I’m not sure if this will solve the error you are getting, but it will solve a future error.

Method 2

You can use torch.utils.data

assuming you have loaded the data from the directory, in train and test numpy arrays, you can inherit from torch.utils.data.Dataset class to create your dataset object

class MyDataset(Dataset):
    def __init__(self, x, y):
        super(MyDataset, self).__init__()
        assert x.shape[0] == y.shape[0] # assuming shape[0] = dataset size
        self.x = x
        self.y = y


    def __len__(self):
        return self.y.shape[0]

    def __getitem__(self, index):
        return self.x[index], self.y[index]

Then, create your dataset object

traindata = MyDataset(train_x, train_y)

Finally, use DataLoader to create your mini-batches

trainloader = torch.utils.data.DataLoader(traindata, batch_size=64, shuffle=True)

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read