Home > Software engineering >  Split string column based on values in list and add it to another column in Pandas DataFrame
Split string column based on values in list and add it to another column in Pandas DataFrame

Time:10-05

I have pandas dataframe

import pandas as pd
data = {"Column1": ["258 E SONORA ST SAN",
                    "57474 SAXONY WAY APT 223 WESLEY",  
                    "62748 CALIFORNIA ST APT 2 SAN",    
                    "3211 LONGLAKE DR FERN",    
                    "420 S PLYMOUTH CT APT 265",    
                    "AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG", 
                    "224-22 141 STREET RICHMOND",   
                    "15624 274TH ST CAMBRIA",   
                    "778 SANTO DOMINGO AVE SW PALM",    
                    "261 BROADMOOR DR SOUTH SIOUX"],    

        "Colum2" : ["BERNARDINO", "CHAPEL", "FRANCISCO", "CREEK", "CHICAGO", "VALLEY", "HILL", "HEIGHTS", "BAY", "CITY"]}

df = pd.DataFrame(data)

df

Output


                  Column1                                 Colum2
0   258 E SONORA ST SAN                                 BERNARDINO
1   57474 SAXONY WAY APT 223 WESLEY                     CHAPEL
2   62748 CALIFORNIA ST APT 2 SAN                       FRANCISCO
3   3211 LONGLAKE DR FERN                               CREEK
4   420 S PLYMOUTH CT APT 265                           CHICAGO
5   AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG    VALLEY
6   224-22 141 STREET RICHMOND                          HILL
7   15624 274TH ST CAMBRIA                              HEIGHTS
8   778 SANTO DOMINGO AVE SW PALM                       BAY
9   261 BROADMOOR DR SOUTH SIOUX                        CITY

and I have a list of values at where I need to split the string in column1

split_city = ["ST","DR", "STREET", "AVE SW"]

I also want to include a split after APT and digit characters.

How can I Split a string of columns based on values in the list and add them to another column in Pandas DataFrame?

Required Output


                  Column1                                 Colum2
0   258 E SONORA ST                                     SAN BERNARDINO
1   57474 SAXONY WAY APT 223                            WESLEY CHAPEL
2   62748 CALIFORNIA ST APT 2                           SAN FRANCISCO
3   3211 LONGLAKE DR                                    FERN CREEK
4   420 S PLYMOUTH CT APT 265                           CHICAGO
5   AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR         LONG VALLEY
6   224-22 141 STREET                                   RICHMOND HILL
7   15624 274TH ST                                      CAMBRIA HEIGHTS
8   778 SANTO DOMINGO AVE SW                            PALM BAY
9   261 BROADMOOR DR                                    SOUTH SIOUX CITY

CodePudding user response:

I don't know if there's a nice way in Pandas to do this, but because the edge cases are so numerous here, it'd be easier to combine the addresses and then use regex instead of trying to split on your split choices (while also accounting for apartments):

Solution

import re


pattern = re.compile(r"([-\d\w ]*)\s(ST|WAY|DR|STREET|AVE|N|S|E|W|SW|SE|NW|NE|APT \d*)\s([\w ]*)")


column1 = ["258 E SONORA ST SAN",
           "57474 SAXONY WAY APT 223 WESLEY",
           "62748 CALIFORNIA ST APT 2 SAN",
           "3211 LONGLAKE DR FERN",
           "420 S PLYMOUTH CT APT 265",
           "AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG",
           "224-22 141 STREET RICHMOND",
           "15624 274TH ST CAMBRIA",
           "778 SANTO DOMINGO AVE SW PALM",
           "261 BROADMOOR DR SOUTH SIOUX"]


column2 = ["BERNARDINO", "CHAPEL", "FRANCISCO", "CREEK", "CHICAGO", "VALLEY", "HILL", "HEIGHTS", "BAY", "CITY"]


combined = [" ".join(t) for t in zip(column1, column2)]
streets = []
cities = []
for t in (re.findall(pattern, s) for s in combined):
    *street, city = t[0]
    streets.append(" ".join(street))
    cities.append(city)


df = pd.DataFrame({"street": streets, "city": cities})

Output:

In [10]: pd.DataFrame({"street": streets, "city": cities})
Out[10]:
                                        street              city
0                              258 E SONORA ST    SAN BERNARDINO
1                     57474 SAXONY WAY APT 223     WESLEY CHAPEL
2                    62748 CALIFORNIA ST APT 2     SAN FRANCISCO
3                             3211 LONGLAKE DR        FERN CREEK
4                    420 S PLYMOUTH CT APT 265           CHICAGO
5  AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR       LONG VALLEY
6                            224-22 141 STREET     RICHMOND HILL
7                               15624 274TH ST   CAMBRIA HEIGHTS
8                     778 SANTO DOMINGO AVE SW          PALM BAY
9                             261 BROADMOOR DR  SOUTH SIOUX CITY
  • Related