Home > Net >  Getting union of files after the combination generated from command line in Python
Getting union of files after the combination generated from command line in Python

Time:11-20

I want to write a python program where I want to take n number of command line arguments.

For an example : python3 myProgram.py 3 A B C

In the above example n = 3 and the 3 arguments are A, B, C

Now 1st I want to generate all the combinations of those n arguments except for the empty one. For the above example it will be : A, B, C, AB, AC, BC, ABC

So I am going to get 2^n-1 number of combinations.

For the above part I am trying like:

import sys
import itertools 
from itertools import combinations

number = int(sys.argv[1]);

a_list=list(sys.argv[2:number 2])

all_combinations = []
for r in range(len(a_list)   1):
    combinations_object = itertools.combinations(a_list, r)
    combinations_list = list(combinations_object)
    all_combinations  = combinations_list

print(all_combinations)

But here I am unable to remove the empty combination.

Now initially I have n files in that same directory. For an example in above case, I have 3 files : A.txt, B.txt, C.txt

Now after that for each combination I want to generate an output file like:

When it is only A then the outputfile_1 = A.txt

When it is only B then the outputfile_2 = B.txt

When it is only C then the outputfile_3 = C.txt

When it is AB then the outputfile_4 = union (A.txt, B.txt)

...

so on

When it is ABC then the outputfile_7 = union (A.txt, B.txt, C.txt)

So for this above example if I run the code like : python3 myProgram.py 3 A B C then I am going to get 7 output files as output.

And if it is python3 myProgram.py 4 A B C D then I am going to get 15 output files as output.

To use the concept of Union, I am trying to use the logic:

with open("A.txt") as fin1: lines = set(fin1.readlines())
with open("B.txt") as fin2: lines.update(set(fin2.readlines()))
with open("outputfile_4.txt", 'w') as fout: fout.write('\n'.join(list(lines)))

But I am unable to understand how to merge these 2 things and get my desired outcome. Please help me out.

CodePudding user response:

I think this is probably two separate questions. The first is how to get all of the combinations where n is greater than 0. @timus was on the right track there. To make their answer more complete:

  1. Use list comprehension to generate a list of itertools.combinations objects
  2. Use nested list comprehension to make a one-dimension list of tuples
matrix = [itertools.combinations(a_list, r) for r in range(1, len(a_list)   1)]
combinations = [c for combinations in matrix for c in combinations]

The second question seems a bit less clear. I'm not sure if it's how to iterate the combinations, how to get filenames from the combinations, or something else. I've provided a sample implementation below (python3.6 ).

import sys
import itertools 

def union(files):
    lines = set()
    for file in files:
        with open(file) as fin:
            lines.update(fin.readlines())
    return lines

def main():
    number = int(sys.argv[1]);
    a_list=sys.argv[2:number 2]
    
    matrix = [itertools.combinations(a_list, r) for r in range(1, len(a_list)   1)]
    combinations = [c for combinations in matrix for c in combinations]
    for combination in combinations:
        filenames = [f'{name}.txt' for name in combination]
        output = f'{"".join(combination)}_output.txt'
        print(f'Writing union of {filenames} to {output}')
        with open(output, 'w') as fout:
            fout.writelines(union(filenames))

if __name__ == '__main__':
    main()
  • Related