Home > Blockchain >  How to calculate the variance of dataframe in this format?
How to calculate the variance of dataframe in this format?

Time:10-08

Here I have a simplified dataframe (The real one is in the same format but just amplified)

import pandas as pd
import numpy as np

row = (1, 2)
columns = ["x", "y", "x", "y", "x", "y", "x", "y"]
data = ([10, 2, 8, 1.5, 9, 2, 11, 1.6], [8, 3, 7.5, 2.2, 9, 2, 8.6, 2.3])

df = pd.DataFrame(data, index = row, columns = columns)

enter image description here

I want to calculate the variance of x, y for both of 1, 2, and the ideal format is

enter image description here

Any hint or help is appreciated

CodePudding user response:

Try this -

  1. Unstack to get the x, y columns as indexes
  2. groupby over both the levels [x,y] and [1,2] and calculate variance.
  3. Unstack and transpose to get [x,y] as columns.
df.unstack().groupby(level=[0,1]).var().unstack().T
          x         y
1  1.666667  0.069167
2  0.435833  0.189167
  • Related