I have a string column that I need to filter. I need to obtain all the values that have letters or special characters in it.
Initial column:
id |
---|
12345 |
23456 |
3940A |
19045 |
2BB56 |
3(40A |
Expected output:
id |
---|
3940A |
2BB56 |
3(40A |
TIA
CodePudding user response:
Just the simple digits regex can solve your problem. ^\d $
would catch all values that is entirely digits.
from pyspark.sql import functions as F
df.where(F.regexp_extract('id', '^\d $', 0) == '').show()
-----
| id|
-----
|3940A|
|2BB56|
|3(401|
-----
CodePudding user response:
The question was very vague, so here is the best answer that I can give:
df_filtered = df.filter(any(not c.isdigit() for c in df.id))