Home > Enterprise >  Extract year and month as string in Pyspark from date column
Extract year and month as string in Pyspark from date column

Time:10-19

With Pandas, I can extract year and month by simply using; tb['yearmon'] = tb['date'].apply(lambda x: x.strftime('%Y%m'))

How can I do this in Pyspark?

CodePudding user response:

This should work as you want it. Basically use the sql functions build into pyspark to extract the year and month and concatenate them with "-"

from pyspark.sql.functions import date_format
df = spark.createDataFrame([('2015-04-08',)], ['date'])
df.select(date_format("date", "yyyy-MM")).collect()
  • Related