I needed to create a new column(FILE_DT)and apply the constant values to all the rows after reading this csv file as a PySpark dataframe.
For example: Sample dataframe
constant values: 2022-10-01
NAME INFO TITLE FILE_DT
AAA 222 BBB 2022-10-01
ACC 111 CCB 2022-10-01
ADD 333 DDC 2022-10-01
ASS 444 NNC 2022-10-01
CodePudding user response:
I tried the below code, It is working but looking for better logic.
from datetime import datetime
object_name = "ORDERS_E220928_D220928.csv"
current_day = datetime.today().strftime("%Y%m%d")
filedate= current_day[0:4] object_name[18:22]
print(filedate) #20221011
import datetime
d = datetime.datetime.strptime(filedate, "%Y%m%d")
s = d.strftime('%Y-%m-%d')
print(s)
df.withColumn("file_dt", lit(s)).show()
CodePudding user response:
the simplest way
import pyspark.sql.functions as F
df_with_date = df.withColumn("FILE_DT",F.lit("2022-10-01").cast("date"))