Home > Software engineering >  Generate Unique Pairs From Two Lists Without Itertools
Generate Unique Pairs From Two Lists Without Itertools

Time:06-02

I am looping over a list twice and want to catch all unique pairs, rather than all combinations- ie. order in the pair doesn't matter

listy=[0,1,2]
out=[]
for i in listy:
   for j in listy:
      out.append([i,j])

The output I'm getting is [[0,0],[0,1],[0,2],[1,0],[1,1],[1,2],[2,0],[2,1],[2,2]]

What I am looking for is [[0,0],[0,1],[0,2],[1,1],[1,2],[2,2]]

One possible solution is,

listy=[0,1,2]
out=[]
for i in listy:
   for j in listy:
      pair=set([i,j])
      if pair not in out:
          out.append(pair)

This produces [{0},{0,1},{0,2},{1},{1,2},{2}]

However this creates inefficiency in what is a heavy script (the lists are long) and I also do not want a list of sets. I want a list of lists.

Is there a better way to achieve this without using itertools (I want an implementation I can also apply to javascript without too much rethinking)

CodePudding user response:

The most straightforward translation of itertools.combinations_with_replacement(listy, 2) (which matches the behavior you desire) that directly generates the results you want from an input list is:

listy = [0,1,2]
out = []
for idx, i in enumerate(listy):
   for j in listy[idx:]:
      out.append([i, j])

Only change is the use of enumerate (to get the current index being iterated as you iterate) and slicing the listy used in the inner loop (so it starts at the same index as the current run of the outer loop).

This gets the exact result requested with minimal overhead (it does make shallow copies of the list, of decreasing size, once for each inner loop, but this is fairly quick in Python; unless the list is huge, it should be a pretty minimal cost). If you need to avoid slicing, you can make the inner loop an index-based loop with indexing (but in practice, the overhead of indexing is high enough that it'll often lose to the slicing):

listy = [0,1,2]
out = []
for idx, i in enumerate(listy):
   for idxj in range(idx, len(listy)):
      out.append([i, j[idxj]])

CodePudding user response:

If you're worried about creating a list of sets instead of a lists of lists, you can replace add the list method to line 5 on your possible solution above:

pair = list(set([i,j]))

The problem with using the set method is when i and j are the same, you return 1 entry instead of 2. You may want to add and if statement there that looks like this:

if i == j:
    pair = [i,j]
else:
    pair=set([i,j])
if pair not in out:
    out.append(pair)

The other thing you could do and I'm not sure about performance with your lists but you could perform SQL and join the two lists. I've done this on dataframes but I'm not sure if you could do this with lists. The join would look like this:

select i, j from (
select min(a.i, b.j) as i, max(a.i,b.j) as j 
from listy a 
join listy b on 1 = 1
) x 
group by i, j

The min and the max handle the proper ordering so if you had 2 and 0 then the pair would be represented as 0,2. The join criteria makes it so that you join across all values. The group by wrapping the query would then handle all your duplicates.

CodePudding user response:

I don't know javascript at all, but if there is something analogous to list comprehension I would try this:

listy = [0,1,2]
pairs = [[i, j] for i in listy for j in listy if i <= j]

The content of pairs is exactly as you wanted:

[[0, 0], [0, 1], [0, 2], [1, 1], [1, 2], [2, 2]]

CodePudding user response:

Option 1 - "minor fix"

A trivial "fix" would be:

listy=[0,1,2]
out=set()
for i in listy:
    for j in listy:
        if (i, j) not in out and (j, i) not in out:
            out.add((i,j))

The result is:

{(0, 1), (1, 2), (0, 0), (1, 1), (0, 2), (2, 2)}

However this is not an efficient implementation, because we have to check twice if an element is in the list.

Option 2 - More efficient implementation

You could achieve your goal using a trivial scan of the array:

listy=[0,1,2]
out = [(i, j) for i in range(len(listy)) for j in range(i, len(listy))]

NOTE: I use tuples for the pairs, you could easily change that into a list of lists using:

out = [[i, j] for i in range(len(listy)) for j in range(i, len(listy))]
  • Related