# How to calculate date difference in pyspark?

## How to calculate date difference in pyspark?

You need to cast the column `low` to class date and then you can use `datediff()` in combination with `lit()`. Using Spark 2.2:

## Method 1

```from pyspark.sql.functions import datediff, to_date, lit

df.withColumn("test",
datediff(to_date(lit("2017-05-02")),
to_date("low","yyyy/MM/dd"))).show()
+----------+----+------+-----+
|       low|high|normal| test|
+----------+----+------+-----+
|1986/10/15|   z|  null|11157|
|1986/10/15|   z|  null|11157|
|1986/10/15|   c|  null|11157|
|1986/10/15|null|  null|11157|
|1986/10/16|null|   4.0|11156|
+----------+----+------+-----+
```

Using < Spark 2.2, we need to convert the the `low` column to class `timestamp` first:

```from pyspark.sql.functions import datediff, to_date, lit, unix_timestamp

df.withColumn("test",
datediff(to_date(lit("2017-05-02")),
to_date(unix_timestamp('low', "yyyy/MM/dd").cast("timestamp")))).show()```

## Method 2

Alternatively, how to find the number of days passed between two subsequent user’s actions using pySpark:

```import pyspark.sql.functions as funcs
from pyspark.sql.window import Window

window = Window.partitionBy('user_id').orderBy('action_date')

df = df.withColumn("days_passed", funcs.datediff(df.action_date,
funcs.lag(df.action_date, 1).over(window)))

+----------+-----------+-----------+
|   user_id|action_date|days_passed|
+----------+-----------+-----------+
|623       |2015-10-21|        null|
|623       |2015-11-19|          29|
|623       |2016-01-13|          59|
|623       |2016-01-21|           8|
|623       |2016-03-24|          63|
+----------+----------+------------+```

## Summery

