I am trying to split the data of column based on '%' symbol. But some of the data that I have does not contain '%' symbol.
Input data
|Default_value |
-------------------
| 10% OF VALUE|
| 20% OF VALUE|
| This is null VALUE|
| 0 is the value|
-------------------
Expected output
|value | Description |
------------------- -------------------
| 10%| OF VALUE |
| 20%| OF VALUE |
| |This is null VALUE |
| | 0 is the value |
------------------- -------------------
I tried with regex on '%' but the row which does not have '%' is coming under 'value' column and I want that in 'Description' column.
CodePudding user response:
You can use regexp_extract
function.
df = spark.createDataFrame(['10% OF VALUE', '20% OF VALUE', 'This is null VALUE', '0 is the value'], StringType()) \
.toDF('Default_value')
df.withColumn('value', regexp_extract('Default_value', '.*%', 0)) \
.withColumn('Description', regexp_extract('Default_value', '(.*%|.{0})(.*)', 2)).show()
------------------ ----- ------------------
| Default_value|value| Description|
------------------ ----- ------------------
| 10% OF VALUE| 10%| OF VALUE|
| 20% OF VALUE| 20%| OF VALUE|
|This is null VALUE| |This is null VALUE|
| 0 is the value| | 0 is the value|
------------------ ----- ------------------