Home > Enterprise >  Pyspark regex_extract number only from a text string which contains special characters too
Pyspark regex_extract number only from a text string which contains special characters too

Time:07-09

I am trying to extract numbers only from a freeText column, and the column will have text like DH-09878877ABC or 9009898DEC or qwert9876788plk.

I just want to extract numbers using below PySpark but it's not working. Please advise

df=df.withColumn("acount_nbr",regexp_extract(df['freeText',r'(^[0-9])',1)

Thanks

CodePudding user response:

If you just want to extract numbers, and assuming the input would have only at most one substring of numbers, you should be using the regex pattern [0-9] :

df = df.withColumn("acount_nbr", regexp_extract(df['freeText', r'([0-9] )', 1)
  • Related