I am trying to extract numbers only from a freeText column, and the column will have text like DH-09878877ABC or 9009898DEC or qwert9876788plk.
I just want to extract numbers using below PySpark but it's not working. Please advise
df=df.withColumn("acount_nbr",regexp_extract(df['freeText',r'(^[0-9])',1)
Thanks
CodePudding user response:
If you just want to extract numbers, and assuming the input would have only at most one substring of numbers, you should be using the regex pattern [0-9]
:
df = df.withColumn("acount_nbr", regexp_extract(df['freeText', r'([0-9] )', 1)