close

[Solved] ValueError: Number of features of the model must match the input

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error ValueError: Number of features of the model must match the input in python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How ValueError: Number of features of the model must match the input Error Occurs?

Today I get the following error ValueError: Number of features of the model must match the input in python.

How To Solve ValueError: Number of features of the model must match the input Error ?

  1. How To Solve ValueError: Number of features of the model must match the input Error ?

    To Solve ValueError: Number of features of the model must match the input Error The reason you're getting the error is due to the different distinct values in your features where you're generating the dummy values with get_dummies.

  2. ValueError: Number of features of the model must match the input

    To Solve ValueError: Number of features of the model must match the input Error The reason you're getting the error is due to the different distinct values in your features where you're generating the dummy values with get_dummies.

Solution 1

The reason you’re getting the error is due to the different distinct values in your features where you’re generating the dummy values with get_dummies.

Let’s suppose the Word_1 column in your training set has the following distinct words: the, dog, jumps, roof, off. That’s 5 distinct words so pandas will generate 5 features for Word_1. Now, if your scoring dataset has a different number of distinct words in the Word_1 column, then you’re going to get a different number of features.

How to fix:

You’ll want to concatenate your training and scoring datasets using concat, apply get_dummies, and then split your datasets. That’ll ensure you have captured all the distinct values in your columns. Given that you’re using two different csv’s, you probably want to generate a column that specifies your training vs scoring dataset.

Example solution:

train_df = pd.read_csv("Cinderella.csv")
train_df['label'] = 'train'

score_df = pandas.read_csv("Slaughterhouse_copy.csv")
score_df['label'] = 'score'

# Concat
concat_df = pd.concat([train_df , score_df])

# Create your dummies
features_df = pd.get_dummies(concat_df, columns=['Overall_Sentiment', 'Word_1','Word_2','Word_3','Word_4','Word_5','Word_6','Word_7','Word_8','Word_9','Word_10','Word_11','Word_1','Word_12','Word_13','Word_14','Word_15','Word_16','Word_17','Word_18','Word_19','Word_20','Word_21','Word_22','Word_23','Word_24','Word_25','Word_26','Word_27','Word_28','Word_29','Word_30','Word_31','Word_32','Word_33','Word_34','Word_35','Word_36','Word_37','Word_38','Word_39','Word_40','Word_41', 'Word_42', 'Word_43'], dummy_na=True)

# Split your data
train_df = features_df[features_df['label'] == 'train']
score_df = features_df[features_df['label'] == 'score']

# Drop your labels
train_df = train_df.drop('label', axis=1)
score_df = score_df.drop('label', axis=1)

# Now delete your 'slope' feature, create your features matrix, and create your model as you have already shown in your example
...

Solution 2

I tried the method suggested here and ended up with hot encoding the label column as well,and in the dataframe it is shown as ‘label_test‘ and ‘label_train‘ so just a heads up try this post get_dummies:

train_df = feature_df[feature_df['label_train'] == 1]
test_df = feature_df[feature_df['label_test'] == 0]
train_df = train_df.drop(['label_train', 'label_test'], axis=1)
test_df = test_df.drop(['label_train', 'label_test'], axis=1)

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also, Read