I want to extract the neighborhood name from the title column in python pandas-CodePudding

so I have a dataset about villas and their information, the title column contains the neighborhoods name "a" and I want to extract it, the thing is the values are not consistent some of them are like this

Villa in street b, neighborhood a, area c, city d.
Villa in neighborhood a, area c, city d.
Villa in neighborhood “blank”, street b, neighborhood a, area c, city d.

I used a loop to split the string after the word "neighborhood"

for i in title:
    print(i.split("neighborhood”)[1])

and it worked for the first two types of rows, but the third one returned the first blank.

CodePudding user response：

Given the information you've given, here's what I came up with, but this only works given the pattern of values you've shared. Otherwise, it'll be more complex than the following code.

import pandas as pd
data = ["Villa in street b, neighborhood a, area c, city d.", "Villa in neighborhood a, area c, city d.",
        "Villa in neighborhood “blank”, street b, neighborhood a, area c, city d."]
df = pd.DataFrame(data, columns = ["Title"])
for i in df.Title:
  if i.count("neighborhood") == 1:
    print(i.split("neighborhood")[1])
  else:
    print(i.split("neighborhood")[2])

CodePudding user response：

You could try to use regular expressions, it's easier for finding patterns in text Below is a code sample that should work.

import re string = re.findall('neighborhood a','neighborhood (\w{1,})') print(string)