Home > database >  Extracting strings from a list by a specific word
Extracting strings from a list by a specific word

Time:10-06

I have this column of addresses in pandas and I want to select only those addresses in the US, however I either get an empty string or thrown an error.

Here's what I have done:

0             238 Lincoln St, Hahnville, LA 70057, USA
1             101 Home Pl Ln, Hahnville, LA 70057, USA
2          1250 Poydras St, New Orleans, LA 70113, USA
3         1117 Broadway STE 401, Tacoma, WA 98402, USA
4              2715 N Junett St, Tacoma, WA 98407, USA
5          Hillstrust Primary School, 29 Nethan St, Govan, Glasgow G51 3LX, UK
6                                5778 JM Godalming, UK
7       569 Durham Rd, Low Fell, Gateshead NE9 5EY, UK
8                 Pennine Way, Barnard Castle DL12, UK
9               14 Studios Rd, Shepperton TW17 0QW, UK


matching = [s for s in final_data["full_address"] if "USA" in s]
matching
#returns: TypeError: argument of type 'float' is not iterable

#Whereas
ab = [final_data["full_address"]]
matching = [s for s in ab if "USA" in s]
matching
#returns: []

Expected output:

0             238 Lincoln St, Hahnville, LA 70057, USA
1             101 Home Pl Ln, Hahnville, LA 70057, USA
2          1250 Poydras St, New Orleans, LA 70113, USA
3         1117 Broadway STE 401, Tacoma, WA 98402, USA
4              2715 N Junett St, Tacoma, WA 98407, USA

CodePudding user response:

Try this:

import pandas as pd

data = {
    'full_address': [
        '238 Lincoln St, Hahnville, LA 70057, USA', '101 Home Pl Ln, Hahnville, LA 70057, USA', '1250 Poydras St, New Orleans, LA 70113, USA',
        '1117 Broadway STE 401, Tacoma, WA 98402, USA', '2715 N Junett St, Tacoma, WA 98407, USA', '5778 JM Godalming, UK', '569 Durham Rd, Low Fell, Gateshead NE9 5EY, UK',
        'Pennine Way, Barnard Castle DL12, UK', '14 Studios Rd, Shepperton TW17 0QW, UK'
    ]
}

df = pd.DataFrame(data)

matching = df[df['full_address'].str.contains("USA")]
print(matching)

Output:

                                   full_address
0      238 Lincoln St, Hahnville, LA 70057, USA
1      101 Home Pl Ln, Hahnville, LA 70057, USA
2   1250 Poydras St, New Orleans, LA 70113, USA
3  1117 Broadway STE 401, Tacoma, WA 98402, USA
4       2715 N Junett St, Tacoma, WA 98407, USA

CodePudding user response:

Hello I have tried to recreate your scenario and in this it is working I just added a query with contain statement on specific column which is here is country

    import pandas as pd

    # Build cars DataFrame
    names = ['238 Lincoln St, Hahnville, LA 70057, USA', '101 Home Pl Ln, Hahnville, LA 70057, USA', 'Hillstrust Govan, Glasgow G51 3LX, UK']

    dict = { 'country':names}
    cars = pd.DataFrame(dict)

    b = cars.query('country.str.contains("USA")', engine='python')
    print(b)
  • Related