I have a text
original = '3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC'
I need to find US and put it at the end: like this
lookup= 'US'
result = original.replace(lookup,"") " " str(lookup)
output: 3200 NORTHLINE AVE STE 360GREENSBORO27408-7611 NC US
However, what if I have lookup as a list like below:
lookup = ['US','CA','INDIA','CHINA']
and multiple input as a list like below:
input = ['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC','200 LINE AVE STE 360 GBORO 77611 CA NC','60 Indiranagar INDIA Bangalore']
I need to find the country for each input list and put it at the end of a string for each list.
I tried many methods but couldn't do. Your help will be greatly appreciated.
Thanks
CodePudding user response:
You can start by breaking the string down into tokens:
>>> address = '3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC'
>>> tokens = address.split()
>>> tokens
['3200', 'NORTHLINE', 'AVE', 'STE', '360', 'GREENSBORO', '27408-7611', 'US', 'NC']
If you want to put a country code at the end you can sort the list of tokens using a key
sort value. Example:
>>> sorted_tokens = sorted(tokens, key=lambda token: token == "US")
>>> sorted_tokens
['3200', 'NORTHLINE', 'AVE', 'STE', '360', 'GREENSBORO', '27408-7611', 'NC', 'US']
To put this back into a string you can use a join
>>> sorted_address = ' '.join(sorted_tokens)
>>> sorted_address
3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US
To make this work for a variety of country codes, you only need to slightly modify the key
lookup = ['US','CA','INDIA','CHINA']
sorted_tokens = sorted(tokens, key=lambda token: token in lookup)
Putting this altogether:
lookup = ['US','CA','INDIA','CHINA']
adresses = [
'3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC',
'200 LINE AVE STE 360 GBORO 77611 CA NC',
'60 Indiranagar INDIA Bangalore',
]
def sort_address(address):
tokens = address.split()
sorted_tokens = sorted(tokens, key=lambda token: token in lookup)
return ' '.join(sorted_tokens)
for address in adresses:
print(sort_address(address))
3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US
200 LINE AVE STE 360 GBORO 77611 NC CA
60 Indiranagar Bangalore INDIA
CodePudding user response:
We can use a regex alternation approach as follows:
lookup = ['US', 'CA', 'INDIA', 'CHINA']
regex = r'\s (' r'|'.join(lookup) r')(.*)'
df["address"] = df["address"].str.replace(regex, r'\2 \1', regex=True)
Assuming you are using a regular Python list, rather than a data frame, we can try a similar approach:
lookup = ['US', 'CA', 'INDIA', 'CHINA']
regex = r'\s (' r'|'.join(lookup) r')(.*)'
inp = ['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC','200 LINE AVE STE 360 GBORO 77611 CA NC','60 Indiranagar INDIA Bangalore']
output = [re.sub(regex, r'\2 \1', x) for x in inp]
print(output)
This prints:
['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US',
'200 LINE AVE STE 360 GBORO 77611 NC CA',
'60 Indiranagar Bangalore INDIA']