so I have a dataset about villas and their information, the title column contains the neighborhoods name "a" and I want to extract it, the thing is the values are not consistent some of them are like this
- Villa in street b, neighborhood a, area c, city d.
- Villa in neighborhood a, area c, city d.
- Villa in neighborhood “blank”, street b, neighborhood a, area c, city d.
I used a loop to split the string after the word "neighborhood"
for i in title:
print(i.split("neighborhood”)[1])
and it worked for the first two types of rows, but the third one returned the first blank.
CodePudding user response:
Given the information you've given, here's what I came up with, but this only works given the pattern of values you've shared. Otherwise, it'll be more complex than the following code.
import pandas as pd
data = ["Villa in street b, neighborhood a, area c, city d.", "Villa in neighborhood a, area c, city d.",
"Villa in neighborhood “blank”, street b, neighborhood a, area c, city d."]
df = pd.DataFrame(data, columns = ["Title"])
for i in df.Title:
if i.count("neighborhood") == 1:
print(i.split("neighborhood")[1])
else:
print(i.split("neighborhood")[2])
CodePudding user response:
You could try to use regular expressions, it's easier for finding patterns in text Below is a code sample that should work.
import re string = re.findall('neighborhood a','neighborhood (\w{1,})') print(string)