Home > database >  Summing dataframe over combinations of index columns
Summing dataframe over combinations of index columns

Time:07-20

I have a dataframe with different sections (only 2 sections and speeds here, but a circuit can be up to 8 sections and 6 measured speeds) like so:

section speed Data1 Data2
A 10 1.5 2.5
A 20 1.0 2.0
B 10 2.5 3.5
B 20 2.0 3.0

I would like to sum my data columns over all possible circuits

A B Data1 Data2
10 10 4.0 6.0
10 20 3.5 5.5
20 10 3.5 5.5
20 20 3.0 5.0

How would I do this? I can make the combinations, but not sure how to sum the data columns over them.

CodePudding user response:

What about using itertools.product, then summing per group:

from itertools import product

df2 = df.set_index(['section', 'speed']).T

out = (pd.concat({k: df2[list(k)].sum(1)
                  for k in product(*(d for _,d in df2.groupby(axis=1, level=0)))})
         .unstack(level=-1)
      )

output:

                 Data1  Data2
(A, 10) (B, 10)    4.0    6.0
        (B, 20)    3.5    5.5
(A, 20) (B, 10)    3.5    5.5
        (B, 20)    3.0    5.0

For the exact provided format:

df2 = df.set_index(['section', 'speed']).T

sections = df2.columns.get_level_values('section').unique()

out = (pd.concat({tuple(x[1] for x in k):
                  df2[list(k)].sum(1)
                  for k in product(*(d for _,d in df2.groupby(axis=1, level=0)))
                 })
         .unstack(level=-1)
         .rename_axis(sections).reset_index()
      )

output:

    A   B  Data1  Data2
0  10  10    4.0    6.0
1  10  20    3.5    5.5
2  20  10    3.5    5.5
3  20  20    3.0    5.0

CodePudding user response:

One approach:

from itertools import product

groups = [[row for i, row in v.iterrows()] for _, v in df.groupby("section")]
rows = []
for p in product(*groups):
    row = {}
    for e in p:
        d = e.to_dict()
        row[d.pop("section")] = d.pop("speed")
        for k, v in d.items():
            row[k] = row.get(k, 0)   v
    rows.append(row)

res = pd.DataFrame(rows)
print(res)

Output

    A  Data1  Data2   B
0  10    4.0    6.0  10
1  10    3.5    5.5  20
2  20    3.5    5.5  10
3  20    3.0    5.0  20

Or more pythonic:

def build_row(prod):
    row = {}
    for e in prod:
        d = e.to_dict()
        row[d.pop("section")] = d.pop("speed")
        for k, v in d.items():
            row[k] = row.get(k, 0)   v
    return row


groups = [[row for i, row in v.iterrows()] for _, v in df.groupby("section")]
res = pd.DataFrame([build_row(p) for p in product(*groups)])
print(res)

Note that if you want exact output, just reorder the columns.

  • Related