I have a dataframe with different sections (only 2 sections and speeds here, but a circuit can be up to 8 sections and 6 measured speeds) like so:
section | speed | Data1 | Data2 |
---|---|---|---|
A | 10 | 1.5 | 2.5 |
A | 20 | 1.0 | 2.0 |
B | 10 | 2.5 | 3.5 |
B | 20 | 2.0 | 3.0 |
I would like to sum my data columns over all possible circuits
A | B | Data1 | Data2 |
---|---|---|---|
10 | 10 | 4.0 | 6.0 |
10 | 20 | 3.5 | 5.5 |
20 | 10 | 3.5 | 5.5 |
20 | 20 | 3.0 | 5.0 |
How would I do this? I can make the combinations, but not sure how to sum the data columns over them.
CodePudding user response:
What about using itertools.product
, then summing per group:
from itertools import product
df2 = df.set_index(['section', 'speed']).T
out = (pd.concat({k: df2[list(k)].sum(1)
for k in product(*(d for _,d in df2.groupby(axis=1, level=0)))})
.unstack(level=-1)
)
output:
Data1 Data2
(A, 10) (B, 10) 4.0 6.0
(B, 20) 3.5 5.5
(A, 20) (B, 10) 3.5 5.5
(B, 20) 3.0 5.0
For the exact provided format:
df2 = df.set_index(['section', 'speed']).T
sections = df2.columns.get_level_values('section').unique()
out = (pd.concat({tuple(x[1] for x in k):
df2[list(k)].sum(1)
for k in product(*(d for _,d in df2.groupby(axis=1, level=0)))
})
.unstack(level=-1)
.rename_axis(sections).reset_index()
)
output:
A B Data1 Data2
0 10 10 4.0 6.0
1 10 20 3.5 5.5
2 20 10 3.5 5.5
3 20 20 3.0 5.0
CodePudding user response:
One approach:
from itertools import product
groups = [[row for i, row in v.iterrows()] for _, v in df.groupby("section")]
rows = []
for p in product(*groups):
row = {}
for e in p:
d = e.to_dict()
row[d.pop("section")] = d.pop("speed")
for k, v in d.items():
row[k] = row.get(k, 0) v
rows.append(row)
res = pd.DataFrame(rows)
print(res)
Output
A Data1 Data2 B
0 10 4.0 6.0 10
1 10 3.5 5.5 20
2 20 3.5 5.5 10
3 20 3.0 5.0 20
Or more pythonic:
def build_row(prod):
row = {}
for e in prod:
d = e.to_dict()
row[d.pop("section")] = d.pop("speed")
for k, v in d.items():
row[k] = row.get(k, 0) v
return row
groups = [[row for i, row in v.iterrows()] for _, v in df.groupby("section")]
res = pd.DataFrame([build_row(p) for p in product(*groups)])
print(res)
Note that if you want exact output, just reorder the columns.