Home > front end >  Filtering on column : Pyspark
Filtering on column : Pyspark

Time:09-17

I will filter a column on dataframe for to have only the number (digit code).

main_column
HKA1774348
null
774970331205
160-27601033
SGSIN/62/898805
null
LOCAL
217-29062806
null
176-07027893
724-22100374
297-00371663
217-11580074

I obtain this column

main_column
774970331205
160-27601033
217-29062806
176-07027893
724-22100374
297-00371663
217-11580074

CodePudding user response:

You can use rlike with an regexp that only includes digits and a hyphen:

df.where(df['main_column'].rlike('^[0-9\-] $')).show()

Output:

 ------------ 
| main_column|
 ------------ 
|774970331205|
|160-27601033|
|217-29062806|
|176-07027893|
|724-22100374|
|297-00371663|
|217-11580074|
 ------------ 
  • Related