close

How to change a dataframe column from String type to Double type in PySpark?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to change a dataframe column from String type to Double type in PySpark in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How to change a dataframe column from String type to Double type in PySpark?

  1. How to change a dataframe column from String type to Double type in PySpark?

    There is no need for an UDF here. Column already provides cast method with DataType instance :
    from pyspark.sql.types import DoubleType

  2. change a dataframe column from String type to Double type in PySpark

    There is no need for an UDF here. Column already provides cast method with DataType instance :
    from pyspark.sql.types import DoubleType

Method 1

There is no need for an UDF here. Column already provides cast method with DataType instance :

from pyspark.sql.types import DoubleType

changedTypedf = joindf.withColumn("label", joindf["show"].cast(DoubleType()))

or short string:

changedTypedf = joindf.withColumn("label", joindf["show"].cast("double"))

where canonical string names (other variations can be supported as well) correspond to simpleString value. So for atomic types:

from pyspark.sql import types 

for t in ['BinaryType', 'BooleanType', 'ByteType', 'DateType', 
          'DecimalType', 'DoubleType', 'FloatType', 'IntegerType', 
           'LongType', 'ShortType', 'StringType', 'TimestampType']:
    print(f"{t}: {getattr(types, t)().simpleString()}")
BinaryType: binary
BooleanType: boolean
ByteType: tinyint
DateType: date
DecimalType: decimal(10,0)
DoubleType: double
FloatType: float
IntegerType: int
LongType: bigint
ShortType: smallint
StringType: string
TimestampType: timestamp

and for example complex types

types.ArrayType(types.IntegerType()).simpleString()   
'array<int>'
types.MapType(types.StringType(), types.IntegerType()).simpleString()
'map<string,int>'

Method 2

Preserve the name of the column and avoid extra column addition by using the same name as input column:

from pyspark.sql.types import DoubleType
changedTypedf = joindf.withColumn("show", joindf["show"].cast(DoubleType()))

Conclusion

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read