With Pandas, I can extract year and month by simply using;
tb['yearmon'] = tb['date'].apply(lambda x: x.strftime('%Y%m'))
How can I do this in Pyspark?
CodePudding user response:
This should work as you want it. Basically use the sql functions build into pyspark to extract the year and month and concatenate them with "-"
from pyspark.sql.functions import date_format
df = spark.createDataFrame([('2015-04-08',)], ['date'])
df.select(date_format("date", "yyyy-MM")).collect()