spark.sql("select case when length(date)>0 then regexp_extract(date,'\\\\d ', 0) else '' end as date from input").show(false)
The above spark code gives the output for the first instance of the numeric characters.
sample input: 1234avf456wef sample output: 1234456
It should display all the numeric characters from the string. Thanks in Advance
CodePudding user response:
You may try the following modification which uses regexp_replace
to remove all non-numeric characters.
Below is a working example which also includes removing special characters from '1234vf456wef &%'
spark.sql("with input as (select '1234vf456wef &%' as date union all select '123' union all select 'a very long 123 string with 456 content.') select date as original,case when length(date)>0 then regexp_replace(date,'[^0-9]', '') else '' end as date from input").show()
Outputs
---------------------------------------- -------
|original |date |
---------------------------------------- -------
|1234vf456wef &% |1234456|
|123 |123 |
|a very long 123 string with 456 content.|123456 |
---------------------------------------- -------
Let me know if this works for you.
CodePudding user response:
It is better to use REG_REPLACE
Maybe you need to adept the regular expression to your needs
SELECT REGEXP_REPLACE('1234avf456wef ', '[a-zA-Z]', '');
| REGEXP_REPLACE('1234avf456wef ', '[a-zA-Z]', '') | | :----------------------------------------------- | | 1234456 |
db<>fiddle here
Add the unwanted special characters to the reg string
SELECT REGEXP_REPLACE('1234avf456wef 5§$$§%&"%%', '[a-zA-Z "§$%&/]', '');
| REGEXP_REPLACE('1234avf456wef 5§$$§%&"%%', '[a-zA-Z "§$%&/]', '') | | :------------------------------------------------------------------- | | 12344565 |
db<>fiddle here
As you have a string use string functions to cut of at 8 c digits otr pad the string as you need
SELECT LPAD(LEFT(REGEXP_REPLACE('1234avf456wef 5§$$§%&"%%', '[a-zA-Z "§$%&/]', ''),8),8,' ');
| LPAD(LEFT(REGEXP_REPLACE('1234avf456wef 5§$$§%&"%%', '[a-zA-Z "§$%&/]', ''),8),8,' ') | | :--------------------------------------------------------------------------------------- | | 12344565 |
db<>fiddle here