Using date_format
we can extract month name from a date:
from pyspark.sql import functions as F
df = spark.createDataFrame([('2021-05-01',),('2021-06-01',)], ['c1']).select(F.col('c1').cast('date'))
df = df.withColumn('month', F.date_format('c1', 'LLLL'))
df.show()
# ---------- -----
#| c1|month|
# ---------- -----
#|2021-05-01| May|
#|2021-06-01| June|
# ---------- -----
It's in English, but I would like to get it in French.
I have found that Spark is aware of month names in French!
spark.sql("select to_csv(named_struct('date', date '1970-06-01'), map('dateFormat', 'LLLL', 'locale', 'FR'))").show()
# ---------------------------------------------
#|to_csv(named_struct(date, DATE '1970-06-01'))|
# ---------------------------------------------
#| juin|
# ---------------------------------------------
But I cannot find a way to make date_format
to accept another locale. How can these functionalities be joined to make the following result?
---------- -----
| c1|month|
---------- -----
|2021-05-01| mai|
|2021-06-01| juin|
---------- -----
CodePudding user response:
Thanks to this clever guy, this is a very nice solution to return results in another language (locale):
df = df.withColumn('month', F.to_csv(F.struct('c1'), {'dateFormat': 'LLLL', 'locale': 'fr'}))
df.show()
# ---------- -----
#| c1|month|
# ---------- -----
#|2021-05-01| mai|
#|2021-06-01| juin|
# ---------- -----