Find and replace a string using another list of string in python-CodePudding

I have a text

original = '3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC'

I need to find US and put it at the end: like this

lookup= 'US'

result = original.replace(lookup,"")   " "   str(lookup)

output: 3200 NORTHLINE AVE STE 360GREENSBORO27408-7611 NC US

However, what if I have lookup as a list like below:

lookup = ['US','CA','INDIA','CHINA']

and multiple input as a list like below:

input = ['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC','200 LINE AVE STE 360 GBORO 77611 CA NC','60 Indiranagar INDIA Bangalore']

I need to find the country for each input list and put it at the end of a string for each list.

I tried many methods but couldn't do. Your help will be greatly appreciated.

Thanks

CodePudding user response：

You can start by breaking the string down into tokens:

>>> address = '3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC'
>>> tokens = address.split()
>>> tokens
['3200', 'NORTHLINE', 'AVE', 'STE', '360', 'GREENSBORO', '27408-7611', 'US', 'NC']

If you want to put a country code at the end you can sort the list of tokens using a key sort value. Example:

>>> sorted_tokens = sorted(tokens, key=lambda token: token == "US")
>>> sorted_tokens
['3200', 'NORTHLINE', 'AVE', 'STE', '360', 'GREENSBORO', '27408-7611', 'NC', 'US']

To put this back into a string you can use a join

>>> sorted_address = ' '.join(sorted_tokens)
>>> sorted_address
3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US

To make this work for a variety of country codes, you only need to slightly modify the key

lookup = ['US','CA','INDIA','CHINA']
sorted_tokens = sorted(tokens, key=lambda token: token in lookup)

Putting this altogether:

lookup = ['US','CA','INDIA','CHINA']
adresses = [
    '3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC',
    '200 LINE AVE STE 360 GBORO 77611 CA NC',
    '60 Indiranagar INDIA Bangalore',
]


def sort_address(address):
    tokens = address.split()
    sorted_tokens = sorted(tokens, key=lambda token: token in lookup)
    return ' '.join(sorted_tokens)


for address in adresses:
    print(sort_address(address))

3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US
200 LINE AVE STE 360 GBORO 77611 NC CA
60 Indiranagar Bangalore INDIA

CodePudding user response：

We can use a regex alternation approach as follows:

lookup = ['US', 'CA', 'INDIA', 'CHINA']
regex = r'\s ('   r'|'.join(lookup)   r')(.*)'
df["address"] = df["address"].str.replace(regex, r'\2 \1', regex=True)

Assuming you are using a regular Python list, rather than a data frame, we can try a similar approach:

lookup = ['US', 'CA', 'INDIA', 'CHINA']
regex = r'\s ('   r'|'.join(lookup)   r')(.*)'
inp = ['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC','200 LINE AVE STE 360 GBORO 77611 CA NC','60 Indiranagar INDIA Bangalore']
output = [re.sub(regex, r'\2 \1', x) for x in inp]
print(output)

This prints:

['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US',
 '200 LINE AVE STE 360 GBORO 77611 NC CA',
 '60 Indiranagar Bangalore INDIA']