close

How to convert a list of strings into a tensor in pytorch?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to convert a list of strings into a tensor in pytorch in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How to convert a list of strings into a tensor in pytorch?

  1. How to convert a list of strings into a tensor in pytorch?

    Unfortunately, you can't right now. And I don't think it is a good idea since it will make PyTorch clumsy. A popular workaround could convert it into numeric types using sklearn.

  2. convert a list of strings into a tensor in pytorch

    Unfortunately, you can't right now. And I don't think it is a good idea since it will make PyTorch clumsy. A popular workaround could convert it into numeric types using sklearn.

Method 1

Unfortunately, you can’t right now. And I don’t think it is a good idea since it will make PyTorch clumsy. A popular workaround could convert it into numeric types using sklearn.

Here is a short example:

from sklearn import preprocessing
import torch

labels = ['cat', 'dog', 'mouse', 'elephant', 'pandas']
le = preprocessing.LabelEncoder()
targets = le.fit_transform(labels)
# targets: array([0, 1, 2, 3])

targets = torch.as_tensor(targets)
# targets: tensor([0, 1, 2, 3])

Since you may need the conversion between true labels and transformed labels, it is good to store the variable le.

Method 2

The trick is first to find out max length of a word in the list, and then at the second loop populate the tensor with zeros padding. Note that utf8 strings take two bytes per char.

In[]
import torch

words = ['שלום', 'beautiful', 'world']
max_l = 0
ts_list = []
for w in words:
    ts_list.append(torch.ByteTensor(list(bytes(w, 'utf8'))))
    max_l = max(ts_list[-1].size()[0], max_l)

w_t = torch.zeros((len(ts_list), max_l), dtype=torch.uint8)
for i, ts in enumerate(ts_list):
    w_t[i, 0:ts.size()[0]] = ts
w_t

Out[]
tensor([[215, 169, 215, 156, 215, 149, 215, 157,   0],
        [ 98, 101,  97, 117, 116, 105, 102, 117, 108],
        [119, 111, 114, 108, 100,   0,   0,   0,   0]], dtype=torch.uint8)

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read