close

[solved] sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples in Python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples Error Occurs?

Today I get the following error sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples in Python.

How To Solve sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples Error ?

  1. How To Solve sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples Error ?

    To Solve sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples Error This means that the length of the various elements you're trying to split don't match.For X and ysklearn will take the same indices, usually a random sample of 80% of the indices of your data. So, the lengths have to match.

  2. sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples

    To Solve sklearn train_test_split – ValueError: Found input variables with inconsistent numbers of samples Error This means that the length of the various elements you're trying to split don't match.For X and ysklearn will take the same indices, usually a random sample of 80% of the indices of your data. So, the lengths have to match.

Solution 1

As you stated, labels orginal shape is (83292, 5) and once you applied MultiLabelBinarizer it became (5, 18).

train_test_split(X, y) function expect that X and y should have the same rows. E.g: 83292 datapoints available in your X and respective datapoints label should be available in your y variable. Hence, in your case it should be X and y shape should be (83292, 15) and (83292, 18).

Try this: Your MultiLabelBinarizer output having wrong dimension here. So, if your labels is a dataframe object, then you should apply mlb.fit_transform(labels.values.tolist()). this would produce the same no of rows as output here 83292.

Example of your labels should be like below format:

your y input can be like list of list or dataframe having one column which having list of values. Make sure you have X and y having same no of rows. You can represent multi-label multi-class y variable like below format. Or dataframe.shape should be (no_of_rows, 1)

[[1, 1, 25, 0, 0],
 [1, 1, 25, 0, 0],
 [1, 1, 25, 0, 0],
 [1, 1, 25, 0, 0],
 [1, 1, 25, 0, 0],
 [3, 5, 50, 0, 0],
 [3, 5, 50, 0, 0],
 [3, 5, 50, 0, 0],
 [3, 5, 50, 0, 0],
 [3, 5, 50, 0, 0]]

Solution 2

This means that the length of the various elements you’re trying to split don’t match.For X and ysklearn will take the same indices, usually a random sample of 80% of the indices of your data. So, the lengths have to match.

Imagine it’s trying to keep these indices. What would sklearn do when there’s nothing at some index?

 0 1 0 0 1 0 1 0 0 1 0 1 0 1
 a b b a b a b a a b b b 
 ^   ^     ^ ^   ^   ^   ^ ^ 

Do this check to verify that the lengths match. Does this return True?

len(dataset) == len(labels)

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also Read

Leave a Comment