Home > Net >  Python - Pass an operator using literal string?
Python - Pass an operator using literal string?

Time:05-25

I have a dictionary of columns names (keys) and their data types (values). The datatypes are literal strings and I'm trying to convert the columns in my PySpark df to the defined data types i.e.

for k, v in dict.items():
   df.withColumn(f'{k}', col(f'{k}').cast(v))

Obviously the above doesn't work because 'ByteType()' doesn't exactly equal ByteType(). Does anyone have any creative workaround to this?

CodePudding user response:

from pyspark.sql.types import * # don't forget to import
# Solution 1
for k,v in dict.items():
   # returns a new Spark DataFrame so you have to declare new df as df
   df = df.withColumn(f'{k}', col(f'{k}').cast(v)) 
df.printSchema() # check dataframe's schema

Solution 2. I don't know your column's datatype but you can convert ByteType() to FloatType() if your column's value range is not between -128 and 127.

CodePudding user response:

After reading comments, it seems that you simply want to cast datatypes from one dataframe to another.

You can do it like this:

df2.select(*[F.col(c).cast(t) for c, t in df1.dtypes])

Full example:

df1 = spark.createDataFrame([('1', '2')], ['c1', 'c2'])
print(df1.dtypes)
# [('c1', 'string'), ('c2', 'string')]

df2 = spark.createDataFrame([(1, 2)], ['c1', 'c2'])
print(df2.dtypes)
# [('c1', 'bigint'), ('c2', 'bigint')]

df2 = df2.select(*[F.col(c).cast(t) for c, t in df1.dtypes])
print(df2.dtypes)
# [('c1', 'string'), ('c2', 'string')]
  • Related