I want to write python code which will be run using the following command :
python3 myProgram.py 4 A B C D stemfile
Where 4 is the number of files and A,B,C,D are 4 files.Then I wanted to generate all the combinations of A,B,C,D except the empty one.(A, B, C, D, AB, AC, AD, BC, BD, CD, ABC, ABD, ACD, BCD, ABCD)
But before that it will read the stemfile.names
and if stemfile.names
has a line | Final Pseudo Deletion Count is 0.
Then only it will generate the above 15 combination, else it will say noisy data
and will not print the combinations of 3 files and not consider D. So the output will be : (A, B, C, AB, AC, BC, ABC)
So in my code what I did is, I always took D as the last file arguments and ran that loop 1 time less. But it is not always true that D will be the last argument only. It can be like : python3 myProgram.py 4 B D C A stemfile
In this case, in my code the A will not be considered while making the combinations, But whenever that line will not be found in the stemfile.names
, I just want to remove D file from the equation. How should I do that?
And later in that code, when the combination is A only it will store the A in a seperate outputfile, whenever it is AB then it stores the union of A,B files in a separate files and so on for all the combinations. Here also if there is noisy data then that D file will not come in any of the outputfile.
One more example, If I give : python3 myProgram.py 3 A D B stemfile
And the stemfile.names
doesn't have the line | Final Pseudo Deletion Count is 0.
then the output combinations are : A,B,AB
and it will create 2 output files only.
Below I am attaching my code:
import sys
import itertools
from itertools import combinations
def union(files):
lines = set()
for file in files:
with open(file) as fin:
lines.update(fin.readlines())
return lines
def main():
number = int(sys.argv[1])
dataset = sys.argv[number 2]
with open(dataset '.names') as myfile:
if '| Final Pseudo Deletion Count is 0.' in myfile.read():
a_list = sys.argv[2:number 2]
print("All possible combinations:\n")
for L in range(1, len(a_list) 1):
for subset in itertools.combinations(a_list, L):
print(*list(subset), sep=',')
print("...............................")
matrix = [itertools.combinations(a_list, r)
for r in range(1, len(a_list) 1)]
combinations = [c for combinations in matrix for c in combinations]
for combination in combinations:
filenames = [f'{name}' for name in combination]
output = f'{"".join(combination)}_output'
print(f'Writing union of {filenames} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
else:
a_list = sys.argv[2:number 1]
# Here I am reducing a number only
print("Noisy data.\n")
print("So all possible combinations:\n")
for L in range(1, len(a_list) 1):
for subset in itertools.combinations(a_list, L):
print(*list(subset), sep=',')
print("................................")
matrix = [itertools.combinations(a_list, r)
for r in range(1, len(a_list) 1)]
combinations = [c for combinations in matrix for c in combinations]
for combination in combinations:
filenames = [f'{name}' for name in combination]
output = f'{"".join(combination)}_output'
print(f'Writing union of {filenames} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
if __name__ == '__main__':
main()
Please help me out.
CodePudding user response:
I think you should probably break this down into smaller, more specific questions. It seems like there is a lot of detail here that's not focused on the specific problem you're facing. I took a shot at what I think you're asking, however.
I think you're trying to figure out how to remove an item from the command line arguments. If that's the case, there's nothing you can do about what's passed to the program, but you can modify the list of inputs after you parse. I really think you should try reading about the argparse
library, as I stated in my comment. I'm not sure if it's exactly what you're looking for, but here's some code using argparse
that expects full filenames for each input file. The last argument must be the stemfile.
Once the arguments are parsed, you have list of pathlib.Path
objects. You can simply remove the D
file from the list.
import argparse
import itertools
import pathlib
NOISY_DATA_LINE = '| Final Pseudo Deletion Count is 0.'
def get_parser():
parser = argparse.ArgumentParser()
parser.add_argument('filenames', type=pathlib.Path, nargs=' ')
parser.add_argument('stemfile', type=pathlib.Path)
return parser
def union(files):
lines = set()
for file in files:
with open(file) as fin:
lines.update(fin.readlines())
return lines
def main():
parser = get_parser()
args = parser.parse_args()
stemfile_lines = args.stemfile.read_text().splitlines()
if stemfile_lines[-1] == NOISY_DATA_LINE:
filenames = [p for p in args.filenames if p.stem != 'D']
else:
filenames = args.filenames
matrix = [itertools.combinations(filenames, r) for r in range(1, len(filenames) 1)]
combinations = [c for combinations in matrix for c in combinations]
print(' '.join([str([p.stem for p in c]) for c in combinations]))
for combination in combinations:
output = f'{"".join([p.stem for p in combination])}_output.txt'
print(f'Writing union of {[p.stem for p in combination]} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
if __name__ == '__main__':
main()