Requirement : Need to populate 4 digit row number with prefix 000
Example : 0001,0002.....0011,0012
Here I am repeating number of zero to prefix based on the length of the row number value i.e in column PAGENO
df.select(F.repeat(F.lit(0), 3))
The value 3 needs to change dynamically based on row number value.
My idea to achive dynamic 0
replication:
df.select(F.repeat(F.lit(0),(4 - F.length(df["PAGENO"]))))
getting error:
'Column' object is not callable
When passing any column or parameter instead of just numeric 3 as no of times repeat should work.
CodePudding user response:
You can use it within an SQL expression:
df.select(F.expr("repeat(0, length(PAGENO))")).show()
However, if I've correctly understood your question you want to use lpad
function. Here's an example:
df = spark.createDataFrame([(1,), (2,), (10,), (12,), (11,)], ["PAGENO"])
df1 = df.withColumn("PAGENO_2", F.expr("lpad(PAGENO, 4, '0')"))
df1.show()
# ------ --------
#|PAGENO|PAGENO_2|
# ------ --------
#| 1| 0001|
#| 2| 0002|
#| 10| 0010|
#| 12| 0012|
#| 11| 0011|
# ------ --------