close

[Solved] ‘GroupedData’ object has no attribute ‘show’ when doing doing pivot in spark dataframe

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error ‘GroupedData’ object has no attribute ‘show’ when doing doing pivot in spark dataframe in python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How ‘GroupedData’ object has no attribute ‘show’ when doing doing pivot in spark dataframe Error Occurs?

Today I get the following error ‘GroupedData’ object has no attribute ‘show’ when doing doing pivot in spark dataframe in python.

How To Solve ‘GroupedData’ object has no attribute ‘show’ when doing doing pivot in spark dataframe Error ?

  1. How To Solve 'GroupedData' object has no attribute 'show' when doing doing pivot in spark dataframe Error ?

    To Solve 'GroupedData' object has no attribute 'show' when doing doing pivot in spark dataframe Error The pivot() method returns a GroupedData object, just like groupBy(). You cannot use show() on a GroupedData object without using an aggregate function (such as sum() or even count()) on it before.

  2. 'GroupedData' object has no attribute 'show' when doing doing pivot in spark dataframe

    To Solve 'GroupedData' object has no attribute 'show' when doing doing pivot in spark dataframe Error The pivot() method returns a GroupedData object, just like groupBy(). You cannot use show() on a GroupedData object without using an aggregate function (such as sum() or even count()) on it before.

Solution 1

The pivot() method returns a GroupedData object, just like groupBy(). You cannot use show() on a GroupedData object without using an aggregate function (such as sum() or even count()) on it before.

Solution 2

Let’s create some test data that resembles your dataset:

data = [
    ("123", "McDonalds"),
    ("123", "Starbucks"),
    ("123", "McDonalds"),
    ("777", "McDonalds"),
    ("777", "McDonalds"),
    ("777", "Dunkin")
]
df = spark.createDataFrame(data, ["customer_id", "name"])
df.show()
+-----------+---------+
|customer_id|     name|
+-----------+---------+
|        123|McDonalds|
|        123|Starbucks|
|        123|McDonalds|
|        777|McDonalds|
|        777|McDonalds|
|        777|   Dunkin|
+-----------+---------+

Let’s pivot the dataset so the customer_ids are columns:

df.groupBy("name").pivot("customer_id").count().show()

+---------+----+----+
|     name| 123| 777|
+---------+----+----+
|McDonalds|   2|   2|
|Starbucks|   1|null|
|   Dunkin|null|   1|
+---------+----+----+

Now let’s pivot the DataFrame so the restaurant names are columns:

df.groupBy("customer_id").pivot("name").count().show()

+-----------+------+---------+---------+
|customer_id|Dunkin|McDonalds|Starbucks|
+-----------+------+---------+---------+
|        777|     1|        2|     null|
|        123|  null|        2|        1|
+-----------+------+---------+---------+

Code like df.groupBy("name").show() errors out with the AttributeError: 'GroupedData' object has no attribute 'show' message. You can only call methods defined in the pyspark.sql.GroupedData class on instances of the GroupedData class.

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also, Read