Home > Blockchain >  How to get the numeric characters from the Alphanumeric string in Spark-Sql
How to get the numeric characters from the Alphanumeric string in Spark-Sql

Time:10-23

spark.sql("select case when length(date)>0  then regexp_extract(date,'\\\\d ', 0) else '' end as date from input").show(false)

The above spark code gives the output for the first instance of the numeric characters.

sample input: 1234avf456wef sample output: 1234456

It should display all the numeric characters from the string. Thanks in Advance

CodePudding user response:

You may try the following modification which uses regexp_replace to remove all non-numeric characters.

Below is a working example which also includes removing special characters from '1234vf456wef &%'

spark.sql("with input as (select '1234vf456wef &%' as date union all select '123' union all select 'a very long 123 string with 456 content.') select date as original,case when length(date)>0  then regexp_replace(date,'[^0-9]', '') else '' end as date from input").show()

Outputs

 ---------------------------------------- ------- 
|original                                |date   |
 ---------------------------------------- ------- 
|1234vf456wef &%                         |1234456|
|123                                     |123    |
|a very long 123 string with 456 content.|123456 |
 ---------------------------------------- ------- 

Let me know if this works for you.

CodePudding user response:

It is better to use REG_REPLACE

Maybe you need to adept the regular expression to your needs

SELECT REGEXP_REPLACE('1234avf456wef ', '[a-zA-Z]', '');
| REGEXP_REPLACE('1234avf456wef ', '[a-zA-Z]', '') |
| :----------------------------------------------- |
| 1234456                                          |

db<>fiddle here

Add the unwanted special characters to the reg string

SELECT REGEXP_REPLACE('1234avf456wef 5§$$§%&"%%', '[a-zA-Z "§$%&/]', '');
| REGEXP_REPLACE('1234avf456wef 5§$$§%&"%%', '[a-zA-Z "§$%&/]', '') |
| :------------------------------------------------------------------- |
| 12344565                                                             |

db<>fiddle here

As you have a string use string functions to cut of at 8 c digits otr pad the string as you need

SELECT LPAD(LEFT(REGEXP_REPLACE('1234avf456wef 5§$$§%&"%%', '[a-zA-Z "§$%&/]', ''),8),8,' ');
| LPAD(LEFT(REGEXP_REPLACE('1234avf456wef 5§$$§%&"%%', '[a-zA-Z "§$%&/]', ''),8),8,' ') |
| :--------------------------------------------------------------------------------------- |
| 12344565                                                                                 |

db<>fiddle here

  • Related