Replace all list elements based of other list elements in python-CodePudding

I have a list_a

list_a = ['some phrase austria', 'another slovakia', 'three paris', 'four united arab emirates']

Then I have a list of countries

countries = ['aruba', 'afghanistan', 'angola', 'austria', 'åland islands', 'albania', 'andorra', 'united arab emirates', 'argentina', 'armenia', 'american samoa', 'antarctica', 'slovakia', 'antigua and barbuda', 'australia']

I want to check if any of the items from countries list exist in every string in list_a and then remove that substring(country name) or replace it with ''.

So the correct output should look like this:

final_list = ['some phrase', 'another', 'three paris', 'four'] # since paris is not in the countries list

I tried with the following:

matching = [s for s in list_a if any(xs in s for xs in countries)]
print(matching)

So I have now a list of two maching items:

['some phrase austria', 'another slovakia', 'four united arab emirates']

And I am stuck here but I guess the solution is close.

CodePudding user response：

Nearly there. Try:

final_list = [s.split( )[0] if any(xs in s for xs in countries) else s for s in list_a]
print(final_list)
['one', 'two', 'three paris']

CodePudding user response：

If the strings are just space-separated words with no punctuation, it would be easiest to split() those string, filter out all words that are in the countries list, and then " ".join them back together.

>>> [' '.join(w for w in s.split() if w not in countries) for s in list_a]
['some phrase', 'another', 'three paris']

If performance is an issue, you might want to convert countries to a set first to speed up the lookup. If there is punctuation, you might instead construct a regex from the countries, in the form \b(c1|c2|...|cn)\b and replace those with the empty string.

>>> import re
>>> p = r"\b("   "|".join(countries)   r")\b"
>>> [re.sub(p, "", w) for w in list_a]
['some phrase ', 'another ', 'three paris']

This method might leave some trailing or double spaces in the strings, though, that would need replacing in a second step.