I have a column called cola which has string data type example "100.z" or "102c"
How to I get rid of all letters or any characters apart from the numbers so cola becomes "100" or "102"
df.withColumn('cola', regexp_replace('cola', 'charsgohere', ''))
CodePudding user response:
You can use the regex [^0-9]
to match any non-digit. For example:
df.withColumn('cola_cleaned', F.regexp_replace('cola', '[^0-9]', ''))
Result:
------ ------------
| cola|cola_cleaned|
------ ------------
| 100.z| 100|
| 102c| 102|
|x1022-| 1022|
------ ------------