I have a list of the form:
mylist =[([256, 408, 147, 628], 'size'), ([628, 526, 236, 676], 'camera'),
([526, 876, 676, 541], 'camera'), ([567, 731, 724, 203], 'size'),.....]
That has a size of around 8000 .
It contains many duplicate entries, there are actually only 100 unique words in this list and so I would like to reduce this list down to a size of 100 (the number of unique words) by taking the average vector of every occurance of that word.
For example, my new list will have the form:
newlist = [([411.5,569.5,435.5,415.5],'size',.....] #I have taken the average values of 'size'
here and want to repeat this for each unique word
and will be of length 100.
How would I do this?
CodePudding user response:
You can do this by collecting all the data for each 'key' into a dict, then work out the average for each element in each list assigned to that key. Something like:
from statistics import mean
data = [([1, 2, 3, 4], 'size'), ([10, 20, 30, 40], 'camera'),
([100, 200, 300, 400], 'camera'), ([10, 20, 30, 40], 'size')]
ddata = {}
for entry in data:
key = entry[-1]
if not key in ddata:
ddata[key] = []
ddata[key].append(entry[0])
#print(ddata)
out = []
for k, v in ddata.items():
out.append((list(map(mean, zip(*v))), k))
print(out)
# [([5.5, 11, 16.5, 22], 'size'), ([55, 110, 165, 220], 'camera')]
CodePudding user response:
You can try this! Let me know if you like it :)
Note that the final output is my_new_list, soy check it out doing:
print(my_new_list)
at the end.
mylist_names = set([item[1] for item in mylist])
my_new_list = []
for name in mylist_names:
name_list = [item[0] for item in mylist if item[1] == name]
name_list = np.mean(name_list, axis=0).tolist()
my_new_list.append((name_list, name))