Home > Enterprise >  np.select and str.extract if cond str.contains a certain regex is not working as expected
np.select and str.extract if cond str.contains a certain regex is not working as expected

Time:03-05

I am trying to extract the state from an address string and some of the addresses are canadian and some american. I think the regex is correct but it is creating an array of shape (29999,29999) and I'm not understanding why:

Here is a sample output of `data['Address']:

19                  6349 IN-45, Bloomington, IN 47403
20                                                  ~
21  370 Canyon Meadows Dr SE, Calgary, AB T2J 7C6,...
22                 3600 Genesee St, Buffalo, NY 14225

Here is my code:

data['state'] = np.select([data['Address'].str.contains(r',(\s.*\s[0-9])'),data['Address'].str.contains(r',(\s.*\s[A-Za-z][0-9])')],[data['Address'].str.extract(r',(\s.*\s[0-9])'),data['Address'].str.extract(r',(\s.*\s[A-Za-z][0-9])')])

Any help appreciated.

CodePudding user response:

Update

Try:

data['State'] = data['Address'].str.extract(r',\s([^\s,] )\s')
print(data)

# Output
                                              Address State
19                  6349 IN-45, Bloomington, IN 47403    IN
20                                                  ~   NaN
21  370 Canyon Meadows Dr SE, Calgary, AB T2J 7C6,...    AB
22                 3600 Genesee St, Buffalo, NY 14225    NY

Old answer

Is it what you expect:

data['State'] = data['Address'].str.extract(r',(\s.*\s(?:[A-Za-z])?[0-9])')
print(data)

# Output
                                              Address               State
19                  6349 IN-45, Bloomington, IN 47403   Bloomington, IN 4
20                                                  ~                 NaN
21  370 Canyon Meadows Dr SE, Calgary, AB T2J 7C6,...   Calgary, AB T2J 7
22                 3600 Genesee St, Buffalo, NY 14225       Buffalo, NY 1

I combine your choice list:

r',(\s.*\s[0-9])'
r',(\s.*\s[A-Za-z][0-9])'

into a single expression:

r'(\s.*\s(?:[A-Za-z])?[0-9])'
  • Related