Home > front end >  I want to extract the neighborhood name from the title column in python pandas
I want to extract the neighborhood name from the title column in python pandas

Time:06-29

so I have a dataset about villas and their information, the title column contains the neighborhoods name "a" and I want to extract it, the thing is the values are not consistent some of them are like this

  • Villa in street b, neighborhood a, area c, city d.
  • Villa in neighborhood a, area c, city d.
  • Villa in neighborhood “blank”, street b, neighborhood a, area c, city d.

I used a loop to split the string after the word "neighborhood"

for i in title:
    print(i.split("neighborhood”)[1])

and it worked for the first two types of rows, but the third one returned the first blank.

CodePudding user response:

Given the information you've given, here's what I came up with, but this only works given the pattern of values you've shared. Otherwise, it'll be more complex than the following code.

import pandas as pd
data = ["Villa in street b, neighborhood a, area c, city d.", "Villa in neighborhood a, area c, city d.",
        "Villa in neighborhood “blank”, street b, neighborhood a, area c, city d."]
df = pd.DataFrame(data, columns = ["Title"])
for i in df.Title:
  if i.count("neighborhood") == 1:
    print(i.split("neighborhood")[1])
  else:
    print(i.split("neighborhood")[2])

CodePudding user response:

You could try to use regular expressions, it's easier for finding patterns in text Below is a code sample that should work.

import re string = re.findall('neighborhood a','neighborhood (\w{1,})') print(string)

  • Related