I have a Dataframe and need to remove leading zeros only for the negative type of values, remaining values to be same.
For example
----------- -----------------
| Input |output |
----------- -----------------
| 0000-12.45| -12.45 |
| 000012.45 | 000012.45 |
| 000$.00| 000$.00|
| 0$ | 0$ |
| 0. | 0. |
| 51.46 | 51.46 |
| -123.67 | -123.67 |
| 00012.45 | 00012.45|
| 012.45 | 012.45 |
I have tried the below way
spark.sql("""select regexp_replace("0000-12.45","^0 -(?!$)",'') as d,regexp_replace("000012.45","^0 -(?!$)",'') as d1,regexp_replace("0000.45","^0 -(?!$)",'') as d2,regexp_replace("0000$.00","^0 -(?!$)",'') as d3,regexp_replace("0.","^0 -(?!$)",'') as d4,regexp_replace("0$","^0 -(?!$)",'') as d5,regexp_replace("00","^0 -(?!$)",'') as d6,regexp_replace("51.46","^0 -(?!$)",'') as d7,regexp_replace("-12234.45","^0 -(?!$)",'') as d8, regexp_replace("0000-12234.45","^0 -(?!$)",'') as d9""").show()
----- --------- ------- -------- --- --- --- ----- --------- --------
| d| d1| d2| d3| d4| d5| d6| d7| d8| d9|
----- --------- ------- -------- --- --- --- ----- --------- --------
|12.45|000012.45|0000.45|0000$.00| 0.| 0$| 00|51.46|-12234.45|12234.45|
----- --------- ------- -------- --- --- --- ----- --------- --------
CodePudding user response:
You can add a condition to remove zeros only if it contains negative sign
(df
.withColumn('ouput', F
.when(F.col('input').contains('-'), F.regexp_replace('input', '^0 ', ''))
.otherwise(F.col('input'))
)
.show()
)
# ---------- ---------
# | input| ouput|
# ---------- ---------
# |0000-12.45| -12.45|
# | 000012.45|000012.45|
# | 000$.00| 000$.00|
# | 0$| 0$|
# | 0.| 0.|
# | 51.46| 51.46|
# | -123.67| -123.67|
# | 00012.45| 00012.45|
# | 012.45| 012.45|
# ---------- ---------