close

How to calculate date difference in pyspark?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to calculate date difference in pyspark in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How to calculate date difference in pyspark?

  1. How to calculate date difference in pyspark?

    You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2:

  2. calculate date difference in pyspark

    You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2:

Method 1

You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2:

from pyspark.sql.functions import datediff, to_date, lit

df.withColumn("test", 
              datediff(to_date(lit("2017-05-02")),
                       to_date("low","yyyy/MM/dd"))).show()
+----------+----+------+-----+
|       low|high|normal| test|
+----------+----+------+-----+
|1986/10/15|   z|  null|11157|
|1986/10/15|   z|  null|11157|
|1986/10/15|   c|  null|11157|
|1986/10/15|null|  null|11157|
|1986/10/16|null|   4.0|11156|
+----------+----+------+-----+

Using < Spark 2.2, we need to convert the the low column to class timestamp first:

from pyspark.sql.functions import datediff, to_date, lit, unix_timestamp

df.withColumn("test", 
              datediff(to_date(lit("2017-05-02")),
                       to_date(unix_timestamp('low', "yyyy/MM/dd").cast("timestamp")))).show()

Method 2

Alternatively, how to find the number of days passed between two subsequent user’s actions using pySpark:

import pyspark.sql.functions as funcs
from pyspark.sql.window import Window

window = Window.partitionBy('user_id').orderBy('action_date')

df = df.withColumn("days_passed", funcs.datediff(df.action_date, 
                                  funcs.lag(df.action_date, 1).over(window)))



+----------+-----------+-----------+
|   user_id|action_date|days_passed| 
+----------+-----------+-----------+
|623       |2015-10-21|        null|
|623       |2015-11-19|          29|
|623       |2016-01-13|          59|
|623       |2016-01-21|           8|
|623       |2016-03-24|          63|
+----------+----------+------------+

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read