Home > Enterprise >  How to convert datetime to int on pyspark
How to convert datetime to int on pyspark

Time:08-12

df['DATE'].apply(lambda x: x.strftime("%Y%m%d")).astype('float64')

Provides an error of

TypeError: 'Column' object is not callable

How would I convert this syntax to comply with pyspark?

CodePudding user response:

a simple way to format 'yyyy-MM-dd' to 'yyyyMMdd'

data= [
    ('2022-08-10', 1),
    ('2022-08-09', 2),
]

df = spark.createDataFrame(data, ['DATE','idx'])
df.printSchema()
# root
#  |-- DATE: string (nullable = true)
#  |-- idx: long (nullable = true)

df = df.withColumn('DATE', regexp_replace(col('DATE'), '-', '').cast('long'))
df.printSchema()
# root
#  |-- DATE: long (nullable = true)
#  |-- idx: long (nullable = true)

df.show(10, False)
#  -------- --- 
# |DATE    |idx|
#  -------- --- 
# |20220810|1  |
# |20220809|2  |
#  -------- --- 
  • Related