Home > other >  Pyspark: TypeError: int is required...got type column
Pyspark: TypeError: int is required...got type column

Time:10-23

I am working with a nested Json structure. I created a dataframe and added a column by doing:

jsonDf = jsonDf.withColumn("REPORT_TIMESTAMP", to_timestamp(jsonDF.reportData.timestamp))/

All good until here. Next thing I need to do is derive the year from "REPORT_TIMESTAMP". I have tried various approaches, for instance:

jsonDf.withColumn("YEAR", datetime.fromtimestamp(to_timestamp(jsonDF.reportData.timestamp).cast("integer"))

that ended with "TypeError: an integer is required (got type Column)

I also tried:

jsonDf.withColumn("YEAR", datetime.date.to_timestamp(jsonDF.reportData.timestamp).year)

that gave me "AttributeError: 'method_descriptor' object has no attribute 'to_timestamp'

Can anyone please correct my 2 previous approaches so it works or even suggest another option that I dont have on the radar yet? Thanks so much in advance

CodePudding user response:

You're mixing between Python functions datetime.date.to_timestamp and PySpark functions.

It's just as simple as .withColumn('YEAR', F.year('dt'))

  • Related