close

How to count unique ID after groupBy in pyspark

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to count unique ID after groupBy in pyspark in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How to count unique ID after groupBy in pyspark?

  1. How to count unique ID after groupBy in pyspark?

    You can also do:
    gr.groupBy("year", "id").count().groupBy("year").count()
    This query will return the unique students per year.

  2. count unique ID after groupBy in pyspark

    You can also do:
    gr.groupBy("year", "id").count().groupBy("year").count()
    This query will return the unique students per year.

Method 1

Use countDistinct function

from pyspark.sql.functions import countDistinct
x = [("2001","id1"),("2002","id1"),("2002","id1"),("2001","id1"),("2001","id2"),("2001","id2"),("2002","id2")]
y = spark.createDataFrame(x,["year","id"])

gr = y.groupBy("year").agg(countDistinct("id"))
gr.show()

output

+----+------------------+
|year|count(DISTINCT id)|
+----+------------------+
|2002|                 2|
|2001|                 2|
+----+------------------+

Method 2

You can also do:

gr.groupBy("year", "id").count().groupBy("year").count()

This query will return the unique students per year.

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read