close

[Solved] ‘DataFrame’ object has no attribute ‘withColumn’

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error ‘DataFrame’ object has no attribute ‘withColumn’ in python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How ‘DataFrame’ object has no attribute ‘withColumn’ Error Occurs?

Today I get the following error ‘DataFrame’ object has no attribute ‘withColumn’ in python.

How To Solve ‘DataFrame’ object has no attribute ‘withColumn’ Error ?

  1. How To Solve 'DataFrame' object has no attribute 'withColumn' Error ?

    To Solve 'DataFrame' object has no attribute 'withColumn' Error Because you are setting these up as Pandas DataFrames and not Spark DataFrames. For joins with Pandas DataFrames, you would want to use

  2. 'DataFrame' object has no attribute 'withColumn'

    To Solve 'DataFrame' object has no attribute 'withColumn' Error Because you are setting these up as Pandas DataFrames and not Spark DataFrames. For joins with Pandas DataFrames, you would want to use

Solution 1

I figured it out. Thanks for the help.

def res(df):
    if df['data_type_x'] == df['data_type_y']:
        return 'no change'
    elif pd.isnull(df['data_type_x']):
        return 'new attribute'
    elif pd.isnull(df['data_type_y']):
        return 'deleted attribute'
    elif df['data_type_x'] != df['data_type_y'] and not pd.isnull(df['data_type_x']) and not pd.isnull(df['data_type_y']):
        return 'datatype change'

pd_merge['result'] = pd_merge.apply(res, axis = 1)

Solution 2

Because you are setting these up as Pandas DataFrames and not Spark DataFrames. For joins with Pandas DataFrames, you would want to use

DataFrame_output = DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Run this to understand what DataFrame it is.

type(df)

To use withColumn, you would need Spark DataFrames. If you want to convert the DataFrames, use this:

import pyspark
from pyspark.sql import SparkSession
import pandas as pd

spark = SparkSession.builder.appName('pandasToSparkDF').getOrCreate()
df = spark.createDataFrame(pd_df1)

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also, Read