I have a function that analyse the transfer-text(buchungstext) of a bank transfer. Python looks if a word is contained in the transfer-text(buchungstext) then returns a GUID, the GUID is then the target booking account. But now I would like if in the list not only one word is searched, but two words are required with an AND-operation. It is important that the words can also be twisted in the sentence.
I've tried it like this before: 'LINEA'|'MADRID'
buchungstext = "METRO MADRID LINEA 7 MASTERCARD - MADRID "
# Tickets und Transport
elif any(wort.upper() in buchungstext.upper() for wort in [ 'LINEA'|'MADRID' ,'METRO BARCELONA','METRO DE MADRID','LUFTHANSA','Trainline','SNCF TGV.COM','LIM*FAHRTKOSTEN','DB Reisezentrum','FINNAIR','DB Vertrieb GmbH','BOLT.EU','OEBB', 'DB BAHN A-NR','UBER','Flixbus','TIER','MVG RAD']):
GUIDzwei = "d45xxxxxxxxxxxxxx013ab953ef26af2"
return()
# Hotels
elif any(wort.upper() in buchungstext.upper() for wort in ['Hotel']):
GUIDzwei = "d45xxxxxxxxxxxxxx013ab953ef26af2"
return()
# Mailservices, Post, DHL
elif any(wort.upper() in buchungstext.upper() for wort in ['bpost','Deutsche Post','UPS','DHL']):
GUIDzwei = "d45xxxxxxxxxxxxxx013ab953ef26af2"
return()
# etc....
CodePudding user response:
If I understand the problem correctly, this may help.
You have a string (buchungstext) comprised of a number of whitespace delimited words.
You want to find out if all of a set of words exist in that string.
The search is not case sensitive.
Therefore:
buchungstext = "METRO MADRID LINEA 7 MASTERCARD - MADRID"
def check(sentence, words):
# tokenise and convert to lowercase
los = {w.lower() for w in sentence.split()}
return all(k.lower() in los for k in words)
print(check(buchungstext, ['mastercard', 'metro']))
print(check(buchungstext, ['mastercard', 'munich']))
Output:
True
False
This gives a logical AND for the list passed as the second parameter. If you want logical OR just change all to any
CodePudding user response:
You can just change in
to re.search
.
Main question is how to search for existence of all provided words in any order. We can do that with regex lookahead
r"(?=.*WORD1)(?=.*WORD2)"
So your crucial part could look like that:
any(re.search(wort, buchungstext, re.I | re.X | re.DOTALL) for wort in [r"(?=.*LINEA)(?=.*MADRIT)", "LUFTHANSA", "..."])
X
flag tells the match is when we find both, not any of word in A|B
.
I
flag is for ignore case search
DOTALL
flag makes .*
match newline as well
CodePudding user response:
Can't you just use for wort in [ 'LINEA', 'MADRID' ,'METRO BARCELONA', ...]
as long as, when you're looking for words inside the list, the or operations is implied?