Change a tuple within a list of tuples-CodePudding

I am reading in data from multiple Excel files and writing them back to an aggregated Excel file.

So I have this output, and it represents the relations of multiple entities within my company (enity-ID) with other companies (debitor-name):

debitor_list = [
    ("1", "X AG"),
    ("1", "X AG"),
    ("1", "Z AG"),
    ("2", "X AG"),
    ("2", "X AG"),
    ("3", "LOL AG"),
    ("1", "Z AG"), 
    ("1", "HS AG"),
    ("2", "hs ag")
]

The tuples structure within this list is the following:

('entity-ID', 'debitor-name')

In addition, I have a list which represents the real names and information about debitors:

real_file = ["LOLLIPOP AG", "HS AG", "X AG", "Z AG"]

Then I am checking for similarities between debitor name in debitor_list and real_file to replace with the real name:

import difflib as dif

for deb in debitor_list:
    for cam in cam_file:
        if deb[1] != cam:
            sequence = dif.SequenceMatcher(
                isjunk=None,
                a=deb[1].lower(),
                b=cam.lower()
            )
            match = sequence.ratio() * 100
            if (match >= 80):
                print(deb[1], cam, match)
                debitor_list.append((deb[0], cam))

Output:

hs ag HS AG 100.0

How can I delete the ("2", "hs ag") tuple?

CodePudding user response：

Either you replace the whole list, or you replace the element in place with some simple logic, see the 2 options below.

Note that tuples might be immutable, but the list itself is not...

import difflib as dif

debitor_list = [
    ("1", "X AG"),
    ("1", "X AG"),
    ("1", "Z AG"),
    ("2", "X AG"),
    ("2", "X AG"),
    ("3", "LOL AG"),
    ("1", "Z AG"),
    ("1", "HS AG"),
    ("2", "hs ag"),
]

real_file = ["LOLLIPOP AG", "HS AG", "X AG", "Z AG"]


def fix_stuff(d_list, c_list):
    result = []
    for deb in d_list:
        repl_val = None
        for cam in c_list:
            if deb[1] != cam:
                sequence = dif.SequenceMatcher(
                    isjunk=None, a=deb[1].lower(), b=cam.lower()
                )
                match = sequence.ratio() * 100
                if match >= 80:
                    repl_val = cam
        if repl_val:
            result.append((deb[0], repl_val))
        else:
            result.append(deb)
    return result


print(debitor_list)
new_deb_list = fix_stuff(debitor_list, real_file)
print(new_deb_list)


for idx, deb in enumerate(debitor_list):
    for cam in real_file:
        if deb[1] != cam:
            sequence = dif.SequenceMatcher(isjunk=None, a=deb[1].lower(), b=cam.lower())
            match = sequence.ratio() * 100
            if match >= 80:
                debitor_list[idx] = (deb[0], cam)
print(debitor_list)

output

[('1', 'X AG'), ('1', 'X AG'), ('1', 'Z AG'), ('2', 'X AG'), ('2', 'X AG'), ('3', 'LOL AG'), ('1', 'Z AG'), ('1', 'HS AG'), ('2', 'hs ag')]
[('1', 'X AG'), ('1', 'X AG'), ('1', 'Z AG'), ('2', 'X AG'), ('2', 'X AG'), ('3', 'LOL AG'), ('1', 'Z AG'), ('1', 'HS AG'), ('2', 'HS AG')]
[('1', 'X AG'), ('1', 'X AG'), ('1', 'Z AG'), ('2', 'X AG'), ('2', 'X AG'), ('3', 'LOL AG'), ('1', 'Z AG'), ('1', 'HS AG'), ('2', 'HS AG')]

The if repl_val checks if the value needs to be replaced. Since the variable repl_val gets set to None at the start of each for, if repl_val will only be true if it was changed during the loop.

As for using result, when using the function, we're not modifying the incoming lists, but we return a new list result.

as for the second way to do this (and that is likely the better way), due to the usage of enumerate we get an index (idx) for each list element, as well as the value deb. It allows for directly assigning to the original list by it's index, so it's a direct modification of the original list.