I have a pandas column that looks like this:
0 LOT 8 BLK 32 MISSIONARY HEIGHTS
1 LTS 31-39 & PT 30, 15-22 LEWIS
2 LT 2 DEPOT SUB
....
I want to extract out the lot number from this but sometimes they are spelled "LOT", "LTS" or "LT". And I want to extract out the lot number into a new column. So, for the example above, I would have:
df['Lot']
0 8
1 31-39
2 2
How can I do this? Please advise.
CodePudding user response:
We can use str.extract
here:
df["Lot"] = df["Address"].str.extract(r'\b(?:LOT|LTS?) (\d (?:-\d )*)')
Here is a demo showing that the regex extraction logic is working.
To remove these lot values from the address, you may use str.replace
:
df["Other"] = df["Address"].str.replace(r'\s*(?:LOT|LTS?) \d (?:-\d )*\s*', ' ').str.strip()
The final call to strip()
is to remove dangling leading/trailing whitespace, which would occur for those addresses either starting or ending with a lot quantity.