Home > Net >  Pandas: Dataframe Crosstab Output To Matrix
Pandas: Dataframe Crosstab Output To Matrix

Time:12-06

I have a simple dataframe with two columns and I would like to generate an output table showing the differences between rows as a matrix?

df = pd.DataFrame({'Subject': ['Alpha', 'Bravo', 'Charlie'],
                    'Total': [31.05590, 32.91925, 36.02484]})
Subject Total
Alpha 31.05590
Bravo 32.91925
Charlie 36.02484

The expected output is basically the differences between rows.

For example:

Alpha-Charlie = Alpha(Total) - Charlie(Total) -> (31.05590 - 36.02484) = -4.96894 and

Bravo-Alpha = Bravo(Total) - Alpha(Total) -> (32.91925 - 31.05590) = 1.86335.

Subject Alpha Bravo Charlie
Alpha 0.00000 -1.86335 -4.96894
Bravo 1.86335 0.00000 -3.10559
Charlie 4.96894 3.10559 0.00000

So far, I have not made much progress!

ct_a = pd.crosstab(df['Subject'], df['Subject'], values=df['Total'], aggfunc=np.sum).fillna(0)
ct_a.to_csv('data/ct_a.csv', index=True)

This code generates the following output:

Subject Alpha Bravo Charlie
Alpha 31.05590 0.00000 0.00000
Bravo 0.00000 32.91925 0.00000
Charlie 0.00000 0.00000 36.02484

I would welcome any advice on how to proceed?

CodePudding user response:

Here is an approach using pandas.merge and pandas.crosstab :

out = (
        df.merge(df, how="cross", suffixes=("", "_c"))
            .assign(Total= lambda x: x["Total"].sub(x["Total_c"]))
            .pipe(lambda d: pd.crosstab(d["Subject"], d["Subject_c"],
                                        values=d["Total"], aggfunc="sum"))
            .rename_axis(None, axis=1)
            .reset_index()
      )

# Output :

print(out)

   Subject    Alpha    Bravo  Charlie
0    Alpha  0.00000 -1.86335 -4.96894
1    Bravo  1.86335  0.00000 -3.10559
2  Charlie  4.96894  3.10559  0.00000
  • Related