Home > Software design >  Searching for substrings in a list of dicts
Searching for substrings in a list of dicts

Time:11-19

I have a list of dicts

I need to search through the "Receiver" keys, and only output dicts that share the last X characters, inside the receiver value, with any other dict.

In this case, we search the last 3 characters of each Receiver value against all other Receiver values.

This is what i have so far

transactions = [
{"Receiver":"alice111","Amount":50},
{"Receiver":"alice222","Amount":60},
{"Receiver":"alice111","Amount":70},
{"Receiver":"bob111","Amount":50},
{"Receiver":"bob222","Amount":150},
{"Receiver":"bob333","Amount":100},
{"Receiver":"kyle444","Amount":260},
{"Receiver":"richard555","Amount":260}
]
new_list=[]

for value in transactions:
    receiver = value["Receiver"]
    last_3 = receiver[-3:]
    #print(receiver)
    #print(last_3)
    for substring in transactions:
        if re.search(last_3   r"$",substring["Receiver"]):
            #print("MATCH"   str(substring))
            new_list.append(substring)

print(new_list)
#[{'Receiver': 'alice111', 'Amount': 50}, {'Receiver': 'alice111', 'Amount': 70}, {'Receiver': 'bob111', 'Amount': 50}, {'Receiver': 'alice222', 'Amount': 60}, {'Receiver': 'bob222', 'Amount': 150}, {'Receiver': 'alice111', 'Amount': 50}, {'Receiver': 'alice111', 'Amount': 70}, {'Receiver': 'bob111', 'Amount': 50}, {'Receiver': 'alice111', 'Amount': 50}, {'Receiver': 'alice111', 'Amount': 70}, {'Receiver': 'bob111', 'Amount': 50}, {'Receiver': 'alice222', 'Amount': 60}, {'Receiver': 'bob222', 'Amount': 150}, {'Receiver': 'bob333', 'Amount': 100}, {'Receiver': 'kyle444', 'Amount': 260}, {'Receiver': 'richard555', 'Amount': 260}]

Unfortunately it's all wrong because it goes over the same values multiple times. With a longer list this would be a total disaster.

desired output

[{"Receiver":"alice111","Amount":50},{"Receiver":"alice222","Amount":60},{"Receiver":"alice111","Amount":70},{"Receiver":"bob111","Amount":50},{"Receiver":"bob222","Amount":150}]

The following should be omitted

[{"Receiver":"bob333","Amount":100},{"Receiver":"kyle444","Amount":260},{"Receiver":"richard555","Amount":260}
]

As you can see, there is no "333" or "444" or "555" as the last characters in any other receiver value, so they are omitted, as i'm not interested in outputting uniques

Update:

what if i wish to match entries that DONT have the same preceeding prefix of characters (before the last 3 character suffix),

transactions1 = [
{"Receiver":"alice111","Amount":50},
{"Receiver":"alice111","Amount":70},
{"Receiver":"bob222","Amount":50},
{"Receiver":"bob222","Amount":150},
{"Receiver":"bob222","Amount":100},
{"Receiver":"richard111","Amount":260},
{"Receiver":"bob333","Amount":100},
{"Receiver":"alice333","Amount":300},

]

new desired output:

[{"Receiver":"alice111","Amount":50}, {"Receiver":"alice111","Amount":70},{"Receiver":"richard111","Amount":50},{"Receiver":"bob333","Amount":100},{"Receiver":"alice333","Amount":300}]

So what's happening is we're only matching if :

-the last 3characters suffix matches AND a differnet name prefix exists

Hope that's clear.

CodePudding user response:

You can first count the occurences and then filter the list according to the count.

from collections import Counter

transactions = [
    {"Receiver":"alice111","Amount":50},
    {"Receiver":"alice222","Amount":60},
    {"Receiver":"alice111","Amount":70},
    {"Receiver":"bob111","Amount":50},
    {"Receiver":"bob222","Amount":150},
    {"Receiver":"bob333","Amount":100},
    {"Receiver":"kyle444","Amount":260},
    {"Receiver":"richard555","Amount":260}
]

counter = Counter(transaction['Receiver'][-3:] for transaction in transactions)
output = [transaction for transaction in transactions if counter[transaction['Receiver'][-3:]] > 1]

print(output)
# [{'Receiver': 'alice111', 'Amount': 50},
#  {'Receiver': 'alice222', 'Amount': 60},
#  {'Receiver': 'alice111', 'Amount': 70},
#  {'Receiver': 'bob111', 'Amount': 50},
#  {'Receiver': 'bob222', 'Amount': 150}]

CodePudding user response:

I hope I've understood your question right. With new input from your question:

transactions1 = [
    {"Receiver": "alice111", "Amount": 50},
    {"Receiver": "alice111", "Amount": 70},
    {"Receiver": "bob222", "Amount": 50},
    {"Receiver": "bob222", "Amount": 150},
    {"Receiver": "bob222", "Amount": 100},
    {"Receiver": "richard111", "Amount": 260},
    {"Receiver": "bob333", "Amount": 100},
    {"Receiver": "alice333", "Amount": 300},
]

tmp = {}
for t in transactions1:
    suffix = t["Receiver"][-3:]
    tmp.setdefault(suffix, set()).add(t["Receiver"])

out = [t for t in transactions1 if len(tmp[t["Receiver"][-3:]]) > 1]
print(out)

Prints:

[
    {"Receiver": "alice111", "Amount": 50},
    {"Receiver": "alice111", "Amount": 70},
    {"Receiver": "richard111", "Amount": 260},
    {"Receiver": "bob333", "Amount": 100},
    {"Receiver": "alice333", "Amount": 300},
]
  • Related