I have a list of file paths and want to extract string that appears after "hone/" and "-"
For e.g if the string is 'abfss://[email protected]/alicona/hone/ 120009163_6722508_.csv' then i would like to extract '120009163' .
Since i have a list of such strings i would want to do this using something in one line or recursive.
I am trying to do this in pyspark.
CodePudding user response:
(?<=hone\/)(.*?)(?=_)
I used _
instead of -
to get you the result that you want.
CodePudding user response:
You could use the regex pattern /(\d )\w*\.\w $
:
df.select(regexp_extract('path', r'/(\d )\w*\.\w $', 1))