Pandas find matching sub-strings and split on them-CodePudding

I have a pandas column that looks like this:

0                         LOT 8 BLK 32 MISSIONARY HEIGHTS 
1                          LTS 31-39 & PT 30, 15-22 LEWIS 
2                                          LT 2 DEPOT SUB
....

I want to extract out the lot number from this but sometimes they are spelled "LOT", "LTS" or "LT". And I want to extract out the lot number into a new column. So, for the example above, I would have:

         df['Lot']
0             8
1           31-39
2             2

How can I do this? Please advise.

CodePudding user response：

We can use str.extract here:

df["Lot"] = df["Address"].str.extract(r'\b(?:LOT|LTS?) (\d (?:-\d )*)')

Here is a demo showing that the regex extraction logic is working.

To remove these lot values from the address, you may use str.replace:

df["Other"] = df["Address"].str.replace(r'\s*(?:LOT|LTS?) \d (?:-\d )*\s*', ' ').str.strip()

The final call to strip() is to remove dangling leading/trailing whitespace, which would occur for those addresses either starting or ending with a lot quantity.