To clean a dataset, I need to split a string after the last digit. Any idea ?
My dataframe:
data = {'addr':[
"510 -1, Cleveland St",
"RC-20-5345 Poplar Street",
"3600 Race Avenue Richardson"]}
df = pd.DataFrame(data)
addr
_____________________________________
510 -1, Cleveland St
RC-20-5345 Poplar Street
3600 Race Avenue Richardson
I tried with this expression, but I missed floor number (RC) in the second row.
df["split1"] = df["addr"].str.extract(r"(\d [-\ ] \d*)")
split1 | split2
___________|_________________________
510 -1 | , Cleveland St
20-5345 | Poplar Street
3600 | Race Avenue Richardson
What I m looking for:
split1 | split2
___________|_________________________
510 -1 | , Cleveland St
RC-20-5345 | Poplar Street
3600 | Race Avenue Richardson
CodePudding user response:
what about just adding a wildcard match to the front of the regex?
df["split1"] = df["addr"].str.extract(r"(.*\d [-\ ] \d*)")
CodePudding user response:
def splitByLastDigit(x):
lastDigit=0
splitOne=""
splitTwo=""
finalArray=[]
for i in range(0,len(x)):
if x[i].isdigit() and i > lastDigit:
lastDigit=i
for i in range(0,len(x)):
if i <= lastDigit:
splitOne =x[i]
else:
splitTwo =x[i]
finalArray.append(splitOne)
finalArray.append(splitTwo)
return finalArray
Just wrote up this solution. It is a bit rough (can definitely be done more elegant) but tested it with the three examples you provided and gets the job done.
Pretty simple idea. Collects the index of the last digit, then another loop checks which characters are before and after that index. Lastly, appends to it an array and returns the final results.
CodePudding user response:
To piggyback on zyd's answer, capture the remainder in another group
data = {'addr':[
"510 -1, Cleveland St",
"RC-20-5345 Poplar Street",
"3600 Race Avenue Richardson"]}
df = pd.DataFrame(data)
df[['split1','split2']] = df["addr"].str.extract(r"(.*\d [-\ ] \d*)(. )")
addr split1 split2
0 510 -1, Cleveland St 510 -1 , Cleveland St
1 RC-20-5345 Poplar Street RC-20-5345 Poplar Street
2 3600 Race Avenue Richardson 3600 Race Avenue Richardson