Python: find substring in string by index-CodePudding

I am making some lists where I need to exclude some cases before they are zipped. Each item in the list has a similar coding "A001_AA","A002_AA" etc. What I would like to do is zip the lists, whilst also removing ones where they repeat. I need to be able to remove them based on the first 4 items in the string.

Below I have included what I would like my output to looklike to help.

listA = "A001_AA","A002_AA" "A003_AA","A004_AA"
listB = "A001_BB","A002_BB" "A003_BB","A004_BB"
listZipped = ("A001_AA","A002_BB"),("A001_AA","A003_BB"),("A001_AA","A004_BB"), ("A002_AA","A001_BB") etc

So I essentially need to be able to do something like:

for i in listA:
    for x in listB:
       if i[first 4 letters] == x[first 4 letters]:
            do not add to zipped list

I hope this makes sense

CodePudding user response：

This works! Please let me know if there are any questions.

listA = ["A001_AA", "A002_AA", "A003_AA", "A004_AA"]
listB = ["A001_BB", "A002_BB", "A003_BB", "A004_BB"]
listZipped = []

for i in listA:
    for j in listB:
        if i[0:4] != j[0:4]:
            listZipped.append((i, j))

print(listZipped)

Alternatively, if you find this more readable, you can remove the nested for loop and replace it with

listZipped = [(i, j) for i in listA for j in listB if i[0:4] != j[0:4]]

Output

[('A001_AA', 'A002_BB'), ('A001_AA', 'A003_BB'), ('A001_AA', 'A004_BB'), ('A002_AA', 'A001_BB'), ('A002_AA', 'A003_BB'), ('A002_AA', 'A004_BB'), ('A003_AA', 'A001_BB'), ('A003_AA', 'A002_BB'), ('A003_AA', 'A004_BB'), ('A004_AA', 'A001_BB'), ('A004_AA', 'A002_BB'), ('A004_AA', 'A003_BB')]