Home > Net >  Pandas - Calculate expected frequency table
Pandas - Calculate expected frequency table

Time:11-19

Consider the following dataframe:

data = [[1, 2, 3, 4], [4, 3, 2, 1]] 
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

What would be the most efficient way to generate an expected frequency table? i.e. for each cell value compute the result of (row total * column total) / (total sum)

So that the final dataframe is:

data = [[2.5, 2.5, 2.5, 2.5], [2.5, 2.5, 2.5, 2.5]] 
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

CodePudding user response:

You can use the underlying numpy array and broadcasting:

a = df.values
pd.DataFrame((a.sum(0)*a.sum(1)[:,None])/a.sum(),
             columns=df.columns, index=df.index)

output:

     A    B    C    D
0  2.5  2.5  2.5  2.5
1  2.5  2.5  2.5  2.5
  • Related