I have 2 lists:
Example 1:-
a=['AA','AA','AA','AA','BB','BB','BB','BB','CC','CC','CC','CC','CC']
c = ['xyz', 'xyz', 'yy', 'xyz', 'zz', 'zy', 'zy', 'zy', 'll', 'll', 'll', 'lz', 'lz']
Example 2:-
a=['AA','AA','AA','AA','AA','AA','AA','AA','AA','BB','BB','BB','BB','CC','CC','CC','CC','CC']
c = ['','','','','','xyz', 'xyz', 'yy', 'xyz', 'zz', 'zy', 'zy', 'zy', 'll', 'll', 'll', 'lz', 'lz']
So in these 2 lists,
For list a there are values repeating in groups of AA,BB and CC for those same repeated value's index I want to change values in list c.
In list c, I want to change values according to group AA's,BB's,CC's index in such a way that whichever value is repeating maximum number of times replaces other values in that group AA's,BB's,CC's index with the value which is being repeated maximum times.
Expected output of list c in example 1:-
c=['xyz','xyz','xyz','xyz','zy','zy','zy','zy','ll','ll','ll','ll','ll']
Expected output of list c in example 2:-
c = ['','','','','','xyz', 'xyz', 'xyz', 'xyz', 'zy', 'zy', 'zy', 'zy', 'll', 'll', 'll', 'll', 'll']
Because AA is repeated four times we checked first 4 values in list C and replaced all values which was being repeated the most. Same for BB and CC.
In example 2: I want to keep empty strings '' as it is and remaining logic should be same as example 1. Empty strings are repeated four times '' in input which should remain same in expected output and rest logic should be same for remaining non empty string values.
CodePudding user response:
itertools.groupby
with zip
gives you slices of c
based on consecutive equal values in a
. With collections.Counter
the most frequent value can be determined per group.
from itertools import groupby
from collections import Counter
a = ['AA', 'AA', 'AA', 'AA', 'BB', 'BB', 'BB', 'BB', 'CC', 'CC', 'CC', 'CC', 'CC']
c = ['xyz', 'xyz', 'yy', 'xyz', 'zz', 'zy', 'zy', 'zy', 'll', 'll', 'll', 'lz', 'lz']
c_new = []
# zip a and c and group by a_i
for _, group in groupby(zip(a, c), key=lambda x:x[0]):
# get the c values from the resulting [(a_i, c_i)] list
c_elems = [x[1] for x in group]
# count them, excluding ''
counts = Counter(x for x in c_elems if x)
# get the maximum
c_max = max(counts, key=counts.get)
# append '' or max_elem once for every element in the group
for c_elem in c_elems:
c_new.append(c_max if c_elem else c_elem)
CodePudding user response:
A very primitive and naïve, but easily illustrated solution:
>>> a=['AA','AA','AA','AA','BB','BB','BB','BB','CC','CC','CC','CC','CC']
>>> c=['xyz','xyz','xyz','xyz','zy','zy','zy','zy','ll','ll','ll','ll','ll']
>>> counts = [a.count(i) for i in a]
[4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5]
>>> combined = list(zip(c, counts))
[('xyz', 4), ('xyz', 4), ('xyz', 4), ('xyz', 4), ('zy', 4), ('zy', 4), ('zy', 4), ('zy', 4), ('ll', 5), ('ll', 5), ('ll', 5), ('ll', 5), ('ll', 5)]
>>> d = sorted(combined, key=lambda i: i[1])
[('xyz', 4), ('xyz', 4), ('xyz', 4), ('xyz', 4), ('zy', 4), ('zy', 4), ('zy', 4), ('zy', 4), ('ll', 5), ('ll', 5), ('ll', 5), ('ll', 5), ('ll', 5)]
>>> e = [i[0] for i in d]
['xyz', 'xyz', 'xyz', 'xyz', 'zy', 'zy', 'zy', 'zy', 'll', 'll', 'll', 'll', 'll']