Home > Back-end >  Find number between two alphabets in an alphanumeric text in a pyspark dataframe column
Find number between two alphabets in an alphanumeric text in a pyspark dataframe column

Time:11-03

I have a column in my dataframe with below values

A123R221343
A12323Q123213
L122F898

There is always 2 alphabets in text, first character and 2nd alphabet could be in 4th,5th,6th or 7th character.

I would like to derive a new column in pyspark with only digits in between them

123
12323
122

I tried regex [A-Za-z].*[A-Za-z] & [\d].*[A-Za-z] but its getting me the alphabets also which I do not want. I'm completely new with regex

CodePudding user response:

Using [A-Za-z].*[A-Za-z] will match any character from the first occurrence of [A-Za-z] till the last occurrence of [A-Za-z]

Using [\d].*[A-Za-z] does the same, only starting with a digit and does not make sure that there is a char A-Za-z before it.


What you can do is capture only digits in a capture group between 2 matches:

[A-Za-z](\d )[A-Za-z]

See a regex demo

  • Related