Home > Software design >  Pandas str.extract() a number that ends in a letter
Pandas str.extract() a number that ends in a letter

Time:07-08

I have a pandas column like below:

      df['description']
0.   PRAIRIE HIGHLANDS SIXTH PLAT Lt: 156 PIN# DP73770000 0156 312 ABC
1.   PRAIRIE VILLAGE   PIN# OP55000034 0020A Rmrk: PT OF
2.   Sub: HOLLY GREEN Lt: 14 Bl: 1 PIN# DP34500001 0D14
3.   FAIRWAY   PIN# GP20000006 0029 Rmrk: W

I want to extract out PIN# such that expected results:

DP73770000 0156 312
OP55000034 0020A
DP34500001 0D14
GP20000006 0029

What I have tried:

df['PIN'] = df['description'].str.extract(r'\b(?:PIN# ?) (\w (?:(?:\s*[ ]) \s*\d )*)\b')

Results I got:

DP73770000 0156 312
OP55000034
DP34500001
GP20000006 0029

The ones that have letters in between the numbers after space are not getting picked up. I have tried a couple other regex as well to no success. How can I resolve this? Please help.

CodePudding user response:

I would use str.extract as follows:

df["PIN"] = df["description"].str.extract(r'PIN#((?: [A-Z0-9]*[0-9][A-Z0-9]*)*)')

Here is a link to a running regex demo showing that the logic is working.

  • Related