close

[Solved] pyspark error: AttributeError: ‘SparkSession’ object has no attribute ‘parallelize’

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error pyspark error: AttributeError: ‘SparkSession’ object has no attribute ‘parallelize’ in python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How pyspark error: AttributeError: ‘SparkSession’ object has no attribute ‘parallelize’ Error Occurs?

Today I get the following error pyspark error: AttributeError: ‘SparkSession’ object has no attribute ‘parallelize’ in python.

How To Solve pyspark error: AttributeError: ‘SparkSession’ object has no attribute ‘parallelize’ Error ?

  1. How To Solve pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize' Error ?

    To Solve pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize' Error SparkSession is not a replacement for a SparkContext but an equivalent of the SQLContext. Just use it use the same way as you used to use SQLContext:

  2. pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize'

    To Solve pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize' Error SparkSession is not a replacement for a SparkContext but an equivalent of the SQLContext. Just use it use the same way as you used to use SQLContext:

Solution 1

SparkSession is not a replacement for a SparkContext but an equivalent of the SQLContext. Just use it use the same way as you used to use SQLContext:

spark.createDataFrame(...)

and if you ever have to access SparkContext use sparkContext attribute:

spark.sparkContext

so if you need SQLContext for backwards compatibility you can:

SQLContext(sparkContext=spark.sparkContext, sparkSession=spark)

Solution 2

Whenever we are trying to create a DF from a backward-compatible object like RDD or a data frame created by spark session, you need to make your SQL context-aware about your session and context.

Like Ex:

If I create a RDD:

ss=SparkSession.builder.appName("vivek").master('local').config("k1","vi").getOrCreate()

rdd=ss.sparkContext.parallelize([('Alex',21),('Bob',44)])

But if we wish to create a df from this RDD, we need to

sq=SQLContext(sparkContext=ss.sparkContext, sparkSession=ss)

then only we can use SQLContext with RDD/DF created by pandas.

schema = StructType([
   StructField("name", StringType(), True),
   StructField("age", IntegerType(), True)])
df=sq.createDataFrame(rdd,schema)
df.collect()

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also, Read