A text file to output conversion in python-CodePudding

I am having a file where there are say n columns. Where the first n-1 columns represent the value of the n-1 attributes and the n-th column represent the value of the class for a particular dataset. Now I want to first read that dataset and print a single line as output where it will print n-1 comma separated * and then at the nth column, the class with the maximum frequency will come and sit. For an example suppose I have a file dataset1.data which contains :

12,13,14,44,0
11,11,10,34,0
22,54,98,11,2
34,90,78,90,1
44,34,34,33,1
22,54,98,11,0
34,90,78,90,2
44,34,34,33,1
22,54,98,11,2
34,90,78,90,2
44,34,34,33,2

For the above case the output will be: *,*,*,*,2 because class 2 has the highest frequency.

And in case of tie in the highest frequency count, it will take the minimum class value.

For an example:

    12,13,14,44,0
    11,11,10,34,0
    22,54,98,11,2
    34,90,78,90,1
    44,34,34,33,1
    22,54,98,11,0
    34,90,78,90,2
    44,34,34,33,1
    22,54,98,11,2

In this case the output will be : *,*,*,*,0 because here all the class have the same frequency.

How can I do it? Can anyone help please!

CodePudding user response：

You could use collections.Counter:

from collections import Counter

cls_counts = Counter()
with open('dataset1.data') as f:
    for line in f:
        row = list(map(int, line.strip().split(',')))
        attrs, cls = row[:-1], row[-1]
        cls_counts[cls]  = 1
max_cls_val = max(cls_counts.values())
max_cls_keys = [cls for cls, count in cls_counts.items() if count == max_cls_val]
print(f"{'*,' * len(attrs)}{min(max_cls_keys)}")

Example Usage 1, Unique class with max count:

dataset1.data:

12,13,14,44,0
11,11,10,34,0
22,54,98,11,2
34,90,78,90,1
44,34,34,33,1
22,54,98,11,0
34,90,78,90,2
44,34,34,33,1
22,54,98,11,2
34,90,78,90,2
44,34,34,33,2

Output:

*,*,*,*,2

Example Usage 2, Multiple classes with max count:

dataset1.data:

12,13,14,44,0
11,11,10,34,0
22,54,98,11,2
34,90,78,90,1
44,34,34,33,1
22,54,98,11,0
34,90,78,90,2
44,34,34,33,1
22,54,98,11,2

Output:

*,*,*,*,0