Home > database >  pyspark repeat function pass parameter dynamically based on column data length
pyspark repeat function pass parameter dynamically based on column data length

Time:12-07

Requirement : Need to populate 4 digit row number with prefix 000

Example : 0001,0002.....0011,0012

Here I am repeating number of zero to prefix based on the length of the row number value i.e in column PAGENO

df.select(F.repeat(F.lit(0), 3))

The value 3 needs to change dynamically based on row number value.

My idea to achive dynamic 0 replication:

df.select(F.repeat(F.lit(0),(4 - F.length(df["PAGENO"]))))

getting error:

'Column' object is not callable

When passing any column or parameter instead of just numeric 3 as no of times repeat should work.

CodePudding user response:

You can use it within an SQL expression:

df.select(F.expr("repeat(0, length(PAGENO))")).show()

However, if I've correctly understood your question you want to use lpad function. Here's an example:

df = spark.createDataFrame([(1,), (2,), (10,), (12,), (11,)], ["PAGENO"])

df1 = df.withColumn("PAGENO_2", F.expr("lpad(PAGENO, 4, '0')"))

df1.show()
# ------ -------- 
#|PAGENO|PAGENO_2|
# ------ -------- 
#|     1|    0001|
#|     2|    0002|
#|    10|    0010|
#|    12|    0012|
#|    11|    0011|
# ------ -------- 
  • Related