close

[Solved] Getting ValueError: y contains new labels when using scikit learn’s LabelEncoder

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error Getting ValueError: y contains new labels when using scikit learn’s LabelEncoder in python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How Getting ValueError: y contains new labels when using scikit learn’s LabelEncoder Error Occurs?

Today I get the following error Getting ValueError: y contains new labels when using scikit learn’s LabelEncoder in python.

How To Solve Getting ValueError: y contains new labels when using scikit learn’s LabelEncoder Error ?

  1. How To Solve Getting ValueError: y contains new labels when using scikit learn's LabelEncoder Error ?

    To Solve Getting ValueError: y contains new labels when using scikit learn's LabelEncoder Error sklearn uses the fit_transform to perform the fit function and transform function directing on label encoding. To solve the problem for Y label throwing error for unseen values, use:

  2. Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

    To Solve Getting ValueError: y contains new labels when using scikit learn's LabelEncoder Error sklearn uses the fit_transform to perform the fit function and transform function directing on label encoding. To solve the problem for Y label throwing error for unseen values, use:

Solution 1

I think the error message is very clear: Your test dataset contains ID labels which have not been included in your training data set. For this items, the LabelEncoder can not find a suitable numeric value to represent. There are a few ways to solve this problem. You can either try to balance your data set, so that you are sure that each label is not only present in your test but also in your training data. Otherwise, you can try to follow one of the ideas presented here.

One of the possibles solutions is, that you search through your data set at the beginning, get a list of all unique ID values, train the LabelEncoder on this list, and keep the rest of your code just as it is at the moment.

An other possible solution is, to check that the test data have only labels which have been seen in the training process. If there is a new label, you have to set it to some fallback value like unknown_id (or something like this). Doin this, you put all new, unknown IDs in one class; for this items the prediction will then fail, but you can use the rest of your code as it is now.

Solution 2

I hope this helps someone as it’s more recent.

sklearn uses the fit_transform to perform the fit function and transform function directing on label encoding. To solve the problem for Y label throwing error for unseen values, use:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()     
le.fit_transform(Col) 

This solves it!

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also, Read