Home > database >  remove all characters apart from number in pyspark
remove all characters apart from number in pyspark

Time:01-19

I have a column called cola which has string data type example "100.z" or "102c"

How to I get rid of all letters or any characters apart from the numbers so cola becomes "100" or "102"

df.withColumn('cola', regexp_replace('cola', 'charsgohere', ''))

CodePudding user response:

You can use the regex [^0-9] to match any non-digit. For example:

df.withColumn('cola_cleaned', F.regexp_replace('cola', '[^0-9]', ''))

Result:

 ------ ------------ 
|  cola|cola_cleaned|
 ------ ------------ 
| 100.z|         100|
|  102c|         102|
|x1022-|        1022|
 ------ ------------ 
  • Related