Home > Software design >  Get sum of elements if the previous elements are the same in a list python
Get sum of elements if the previous elements are the same in a list python

Time:09-23

I am trying to get the sum of the last item, item[3], when the first three elements are the same.

For example, ([2810], ['C'], ['T'], [40]) , ([2810], ['C'], ['T'], [40]) and all other items in the list that share the first three elements should give ([2810], ['C'], ['T'], [the sum of all item[3] when the first 3 elements are [2810], ['C'], ['T'] ])

*Cases like ([2792, 2810], ['C', 'C'], ['T', 'T'], [40, 40]) should be counted as two separate cases, eg: ([2792], ['C'], ['T'], [40]), ([2810], ['C'], ['T'], [40])

[([2792], ['C'], ['T'], [39]), ([2810], ['C'], ['T'], [40]), ([586], ['G'], ['A'], [40]), ([586], ['G'], ['A'], [40]), ([832], ['G'], ['A'], [40]), ([2810], ['C'], ['T'], [40]), ([2792, 2810], ['C', 'C'], ['T', 'T'], [40, 40]), ([2730], ['A'], ['G'], [40]), ([4623, 4624], ['A', 'T'], ['G', 'C'], [29, 12]), ([2810], ['C'], ['T'], [40]), ([4687], ['T'], ['G'], [22]), ([2730], ['A'], ['G'], [40]), ([3493], ['G'], ['T'], [40]), ([2730], ['A'], ['G'], [40]), ([2810], ['C'], ['T'], [40]), ([832], ['G'], ['A'], [40]), ([444, 471], ['A', 'A'], ['T', 'T'], [10, 15]), ([2730], ['A'], ['G'], [40]), ([784], ['T'], ['A'], [27]), ([2730], ['A'], ['G'], [40]), ([2730], ['A'], ['G'], [40]), ([2792, 2810], ['C', 'C'], ['T', 'T'], [40, 40]), ([5373], ['T'], ['C'], [31]), ([3131], ['G'], ['A'], [40]), ([2730], ['A'], ['G'], [40]), ([2810], ['C'], ['T'], [40]), ([2792, 2810], ['C', 'C'], ['T', 'T'], [40, 40]), ([586], ['G'], ['A'], [40]), ([3578], ['A'], ['T'], [40]), ([2810], ['C'], ['T'], [40]), ([2730], ['A'], ['G'], [39]), ([832], ['G'], ['A'], [40]), ([2810], ['C'], ['T'], [40]), ([832], ['G'], ['A'], [38]), ([4248], ['T'], ['A'], [33]), ([832], ['G'], ['A'], [39]), ([2792], ['C'], ['T'], [40]), ([586], ['G'], ['A'], [40]), ([832], ['G'], ['A'], [40]), ([2730], ['A'], ['G'], [40]), ([2730], ['A'], ['G'], [40]), ([2730], ['A'], ['G'], [38]), ([2810], ['C'], ['T'], [40]), ([832], ['G'], ['A'], [40]), ([2730], ['A'], ['G'], [37]), ([4146, 4173], ['A', 'T'], ['T', 'G'], [33, 9]), ([99, 103], ['A', 'A'], ['C', 'C'], [24, 28]), ([99, 108], ['A', 'A'], ['C', 'C'], [19, 28]), ([882], ['T'], ['A'], [40]), ([2663], ['T'], ['A'], [23]), ([832], ['G'], ['A'], [40]), ([2792], ['C'], ['T'], [40])]

CodePudding user response:

this could be an option using pandas, first use explode in every colums to get rid of then list values the groupby and sum the elements

data =  [([2792], ['C'], ['T'], [39]), ([2810], ['C'], ['T'], [40]), ([586], ['G'], ['A'], [40]), ([586], ['G'], ['A'], [40]), ([832], ['G'], ['A'], [40]), ([2810], ['C'], ['T'], [40]), ([2792, 2810], ['C', 'C'], ['T', 'T'], [40, 40]), ([2730], ['A'], ['G'], [40]), ([4623, 4624], ['A', 'T'], ['G', 'C'], [29, 12]), ([2810], ['C'], ['T'], [40]), ([4687], ['T'], ['G'], [22]), ([2730], ['A'], ['G'], [40]), ([3493], ['G'], ['T'], [40]), ([2730], ['A'], ['G'], [40]), ([2810], ['C'], ['T'], [40]), ([832], ['G'], ['A'], [40]), ([444, 471], ['A', 'A'], ['T', 'T'], [10, 15]), ([2730], ['A'], ['G'], [40]), ([784], ['T'], ['A'], [27]), ([2730], ['A'], ['G'], [40]), ([2730], ['A'], ['G'], [40]), ([2792, 2810], ['C', 'C'], ['T', 'T'], [40, 40]), ([5373], ['T'], ['C'], [31]), ([3131], ['G'], ['A'], [40]), ([2730], ['A'], ['G'], [40]), ([2810], ['C'], ['T'], [40]), ([2792, 2810], ['C', 'C'], ['T', 'T'], [40, 40]), ([586], ['G'], ['A'], [40]), ([3578], ['A'], ['T'], [40]), ([2810], ['C'], ['T'], [40]), ([2730], ['A'], ['G'], [39]), ([832], ['G'], ['A'], [40]), ([2810], ['C'], ['T'], [40]), ([832], ['G'], ['A'], [38]), ([4248], ['T'], ['A'], [33]), ([832], ['G'], ['A'], [39]), ([2792], ['C'], ['T'], [40]), ([586], ['G'], ['A'], [40]), ([832], ['G'], ['A'], [40]), ([2730], ['A'], ['G'], [40]), ([2730], ['A'], ['G'], [40]), ([2730], ['A'], ['G'], [38]), ([2810], ['C'], ['T'], [40]), ([832], ['G'], ['A'], [40]), ([2730], ['A'], ['G'], [37]), ([4146, 4173], ['A', 'T'], ['T', 'G'], [33, 9]), ([99, 103], ['A', 'A'], ['C', 'C'], [24, 28]), ([99, 108], ['A', 'A'], ['C', 'C'], [19, 28]), ([882], ['T'], ['A'], [40]), ([2663], ['T'], ['A'], [23]), ([832], ['G'], ['A'], [40]), ([2792], ['C'], ['T'], [40])]
columns = ["A", "B", "C", "D"]
df = pd.DataFrame(data, columns=["A", "B", "C", "D"])
for col in columns:
    df=df.explode(col)
df.groupby(["A", "B", "C"]).sum()
            D
A   B   C   
99  A   C   811008
103     A   C   425984
108     A   C   385024
444     A   T   204800
471     A   T   204800
586     G   A   160
784     T   A   27
832     G   A   317
882     T   A   40
2663    T   A   23
2730    A   G   474
2792    C   T   1966199
2810    C   T   1966400
3131    G   A   40
3493    G   T   40
3578    A   T   40
4146    A   G   86016
T   86016
T   G   86016
T   86016
4173    A   G   86016
T   86016
T   G   86016
T   86016
4248    T   A   33
4623    A   C   83968
G   83968
T   C   83968
G   83968
4624    A   C   83968
G   83968
T   C   83968
G   83968
4687    T   G   22
5373    T   C   31

CodePudding user response:

You can try the groupby() method from the built-in itertools module. It groups consecutively similar values, so if the values are not sorted by the first 3 elements in each tuple, it requires the data to be sorted. Once you do so, call groupby() and tell it to group by the first 3 elements in each tuple. Then for each item in each group, index the third item and sum() the values in it; and also sum() each of these sub-sums for each group.

from itertools import groupby
[(*k, sum(sum(item[3]) for item in v)) for k, v in groupby(sorted(my_list), lambda x: x[:3])]
[([99, 103], ['A', 'A'], ['C', 'C'], 52),
 ([99, 108], ['A', 'A'], ['C', 'C'], 47),
 ([444, 471], ['A', 'A'], ['T', 'T'], 25),
 ([586], ['G'], ['A'], 160),
 ([784], ['T'], ['A'], 27),
 ([832], ['G'], ['A'], 317),
 ([882], ['T'], ['A'], 40),
 ([2663], ['T'], ['A'], 23),
 ([2730], ['A'], ['G'], 474),
 ([2792], ['C'], ['T'], 119),
 ([2792, 2810], ['C', 'C'], ['T', 'T'], 240),
 ([2810], ['C'], ['T'], 320),
 ([3131], ['G'], ['A'], 40),
 ([3493], ['G'], ['T'], 40),
 ([3578], ['A'], ['T'], 40),
 ([4146, 4173], ['A', 'T'], ['T', 'G'], 42),
 ([4248], ['T'], ['A'], 33),
 ([4623, 4624], ['A', 'T'], ['G', 'C'], 41),
 ([4687], ['T'], ['G'], 22),
 ([5373], ['T'], ['C'], 31)]
  • Related