I have a simple dataframe with two columns and I would like to generate an output table showing the differences between rows as a matrix?
df = pd.DataFrame({'Subject': ['Alpha', 'Bravo', 'Charlie'],
'Total': [31.05590, 32.91925, 36.02484]})
Subject | Total |
---|---|
Alpha | 31.05590 |
Bravo | 32.91925 |
Charlie | 36.02484 |
The expected output is basically the differences between rows.
For example:
Alpha-Charlie = Alpha(Total) - Charlie(Total) -> (31.05590 - 36.02484) = -4.96894 and
Bravo-Alpha = Bravo(Total) - Alpha(Total) -> (32.91925 - 31.05590) = 1.86335.
Subject | Alpha | Bravo | Charlie |
---|---|---|---|
Alpha | 0.00000 | -1.86335 | -4.96894 |
Bravo | 1.86335 | 0.00000 | -3.10559 |
Charlie | 4.96894 | 3.10559 | 0.00000 |
So far, I have not made much progress!
ct_a = pd.crosstab(df['Subject'], df['Subject'], values=df['Total'], aggfunc=np.sum).fillna(0)
ct_a.to_csv('data/ct_a.csv', index=True)
This code generates the following output:
Subject | Alpha | Bravo | Charlie |
---|---|---|---|
Alpha | 31.05590 | 0.00000 | 0.00000 |
Bravo | 0.00000 | 32.91925 | 0.00000 |
Charlie | 0.00000 | 0.00000 | 36.02484 |
I would welcome any advice on how to proceed?
CodePudding user response:
Here is an approach using pandas.merge
and pandas.crosstab
:
out = (
df.merge(df, how="cross", suffixes=("", "_c"))
.assign(Total= lambda x: x["Total"].sub(x["Total_c"]))
.pipe(lambda d: pd.crosstab(d["Subject"], d["Subject_c"],
values=d["Total"], aggfunc="sum"))
.rename_axis(None, axis=1)
.reset_index()
)
# Output :
print(out)
Subject Alpha Bravo Charlie
0 Alpha 0.00000 -1.86335 -4.96894
1 Bravo 1.86335 0.00000 -3.10559
2 Charlie 4.96894 3.10559 0.00000