close

How to use sklearn fit_transform with pandas and return dataframe instead of numpy array?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to use sklearn fit_transform with pandas and return dataframe instead of numpy array in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

How to use sklearn fit_transform with pandas and return dataframe instead of numpy array?

  1. How to use sklearn fit_transform with pandas and return dataframe instead of numpy array?

    You could convert the DataFrame as a numpy array using as_matrix(). Example on a random dataset:

  2. use sklearn fit_transform with pandas and return dataframe instead of numpy array

    You could convert the DataFrame as a numpy array using as_matrix(). Example on a random dataset:

Method 1

You could convert the DataFrame as a numpy array using as_matrix(). Example on a random dataset:

Changing as_matrix() to values, (it doesn’t change the result) per the last sentence of the as_matrix() docs above:

Generally, it is recommended to use ‘.values’.

import pandas as pd
import numpy as np #for the random integer example
df = pd.DataFrame(np.random.randint(0.0,100.0,size=(10,4)),
              index=range(10,20),
              columns=['col1','col2','col3','col4'],
              dtype='float64')

Note, indices are 10-19:

In [14]: df.head(3)
Out[14]:
    col1    col2    col3    col4
    10  3   38  86  65
    11  98  3   66  68
    12  88  46  35  68

Now fit_transform the DataFrame to get the scaled_features array:

from sklearn.preprocessing import StandardScaler
scaled_features = StandardScaler().fit_transform(df.values)

In [15]: scaled_features[:3,:] #lost the indices
Out[15]:
array([[-1.89007341,  0.05636005,  1.74514417,  0.46669562],
       [ 1.26558518, -1.35264122,  0.82178747,  0.59282958],
       [ 0.93341059,  0.37841748, -0.60941542,  0.59282958]])

Assign the scaled data to a DataFrame (Note: use the index and columns keyword arguments to keep your original indices and column names:

scaled_features_df = pd.DataFrame(scaled_features, index=df.index, columns=df.columns)

In [17]:  scaled_features_df.head(3)
Out[17]:
    col1    col2    col3    col4
10  -1.890073   0.056360    1.745144    0.466696
11  1.265585    -1.352641   0.821787    0.592830
12  0.933411    0.378417    -0.609415   0.592830

Method 2

import pandas as pd    
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('your file here')
ss = StandardScaler()
df_scaled = pd.DataFrame(ss.fit_transform(df),columns = df.columns)

The df_scaled will be the ‘same’ dataframe, only now with the scaled values

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read