Home > Software design >  how to create a new column with Constant Value in pyspark dataframe?
how to create a new column with Constant Value in pyspark dataframe?

Time:10-12

I needed to create a new column(FILE_DT)and apply the constant values to all the rows after reading this csv file as a PySpark dataframe.

For example: Sample dataframe

constant values: 2022-10-01

NAME   INFO   TITLE   FILE_DT
AAA    222     BBB    2022-10-01
ACC    111     CCB    2022-10-01  
ADD    333     DDC    2022-10-01
ASS    444     NNC    2022-10-01

CodePudding user response:

I tried the below code, It is working but looking for better logic.

from datetime import datetime
object_name = "ORDERS_E220928_D220928.csv"
current_day = datetime.today().strftime("%Y%m%d")
filedate= current_day[0:4]   object_name[18:22]
print(filedate)    #20221011

import datetime
d = datetime.datetime.strptime(filedate, "%Y%m%d")
s = d.strftime('%Y-%m-%d')
print(s)
df.withColumn("file_dt", lit(s)).show()

CodePudding user response:

the simplest way

import pyspark.sql.functions as F

df_with_date = df.withColumn("FILE_DT",F.lit("2022-10-01").cast("date"))

  • Related