close

How to create a copy of a dataframe in pyspark?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to create a copy of a dataframe in pyspark in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How to create a copy of a dataframe in pyspark?

  1. How to create a copy of a dataframe in pyspark?

    As explained in the answer to the other question, you could make a deepcopy of your initial schema. We can then modify that copy and use it to initialize the new DataFrame _X:

  2. create a copy of a dataframe in pyspark

    As explained in the answer to the other question, you could make a deepcopy of your initial schema. We can then modify that copy and use it to initialize the new DataFrame _X:

Method 1

As explained in the answer to the other question, you could make a deepcopy of your initial schema. We can then modify that copy and use it to initialize the new DataFrame _X:

import pyspark.sql.functions as F
from pyspark.sql.types import LongType
import copy

X = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])
_schema = copy.deepcopy(X.schema)
_schema.add('id_col', LongType(), False) # modified inplace
_X = X.rdd.zipWithIndex().map(lambda l: list(l[0]) + [l[1]]).toDF(_schema)

Now let’s check:

print('Schema of X: ' + str(X.schema))
print('Schema of _X: ' + str(_X.schema))

Output:

Schema of X: StructType(List(StructField(a,LongType,true),StructField(b,LongType,true)))
Schema of _X: StructType(List(StructField(a,LongType,true),
                  StructField(b,LongType,true),StructField(id_col,LongType,false)))

Note that to copy a DataFrame you can just use _X = X. Whenever you add a new column with e.g. withColumn, the object is not altered in place, but a new copy is returned. Hope this helps!

Method 2

.alias() is commonly used in renaming the columns, but it is also a DataFrame method and will give you what you want:

df2 = df.alias('df2')
id(df2) == id(df)  # False

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read