sorry this is likely a complete noob question, although I'm new to python and am unable to implement any online suggestions such that they actually work. I need decrease the run-time of the code for larger files, so need to reduce the number of iterations i'm doing.
How do I modify the append_value function below to append only UNIQUE values to dict_obj, and remove the need for another series of iterations to do this later on.
EDIT: Sorry, here is an example input/output
Sample Input:
6
5 6
0 1
1 4
5 4
1 2
4 0
Sample Output:
1
4
I'm attempting to solve to solve: http://orac.amt.edu.au/cgi-bin/train/problem.pl?problemid=416
input_file = open("listin.txt", "r")
output_file = open("listout.txt", "w")
ls = []
n = int(input_file.readline())
for i in range(n):
a, b = input_file.readline().split()
ls.append(int(a))
ls.append(int(b))
def append_value(dict_obj, key, value): # How to append only UNIQUE values to
if key in dict_obj: # dict_obj?
if not isinstance(dict_obj[key], list):
dict_obj[key] = [dict_obj[key]]
dict_obj[key].append(value)
else:
dict_obj[key] = value
mx = []
ls.sort()
Dict = {}
for i in range(len(ls)):
c = ls.count(ls[i])
append_value(Dict, int(c), ls[i])
mx.append(c)
x = max(mx)
lss = []
list_set = set(Dict[x]) #To remove the need for this
unique_list = (list(list_set))
for x in unique_list:
lss.append(x)
lsss = sorted(lss)
for i in lsss:
output_file.write(str(i) "\n")
output_file.close()
input_file.close()
Thank you
CodePudding user response:
If you intention is to have a list in each value of the dictionary why not iterate the same way you iterated on each key.
if key in dict_obj.keys():
for elem in dict_obje[key]: #dict_obje[key] asusming the value is a list
if (elem == value):
else:
#append the value to the desired list
else:
dic_obj[key]=value
CodePudding user response:
The answer to your question, 'how to only append unique values to this container' is fairly simple: change it from a list
to a set
(as @ShadowRanger suggested in the comments). This isn't really a question about dictionaries, though; you're not appending values to 'dict_obj', only to a list stored in the dictionary.
Since the source you linked to shows this is a training problem for people newer to coding, you should know that changing the lists to sets might be a good idea, but it's not the cause of the performance issues.
The problem boils down to: given a file containing a list of integers, print the most common integer(s). Your current code iterates over the list, and for each index i
, iterates over the entire list to count matches with ls[i]
(this is the line c = ls.count(ls[i])
).
Some operations are more expensive than others: calling count()
is one of the more expensive operations on a Python list. It reads through the entire list every time it's called. This is an O(n)
function, which is inside a length n
loop, taking O(n^2)
time. All of the set()
filtering for non-unique elements takes O(n)
time total (and is even quite fast in practice). Identifying linear-time functions hidden in loops like this is a frequent theme in optimization, but profiling your code would have identified this.
In general, you'll want to use something like the Counter class in Python's standard library for frequency counting. That kind of defeats the whole point of this training problem, though, which is to encourage you to improve on the brute-force algorithm for finding the most frequent element(s) in a list. One possible way to solve this problem is to read the description of Counter
, and try to mimic its behavior yourself with a plain Python dictionary.