combine statistical outputs in python-CodePudding

Trying to combine stats outputs of two datasets that are related with pandas, one is like this,

PoweredUp

min  max      mean  median       var       std
magic  -1.0  1.0  0.282669     0.8  0.659919  0.812354
magnitude   1.0  1.0  1.000000     1.0  0.000000  0.000000
power  0.0  0.0  0.000000     0.0  0.000000  0.000000

PoweredDown

min  max      mean  median       var       std
magic  -1.0  1.0  0.473780     1.0  0.586732  0.765984
magnitude   1.0  1.0  1.000000     1.0  0.000000  0.000000
power  1.0  2.0  1.152439     1.0  0.129994  0.360547

I want to create an output that has these variables in a single dataframe. Not 100% sure on the best way to approach it really, perhaps prefixing PoweredUp and PoweredDown to the columns for magic, magnitude and power and transposing and joining the dataframe?

CodePudding user response：

You can do pretty much what you want.

Use pandas.concat:

pd.concat({'PoweredUp': PoweredUp, 'PoweredDown': PoweredUp})

output:

                       min  max      mean  median       var       std
PoweredUp   magic     -1.0  1.0  0.282669     0.8  0.659919  0.812354
            magnitude  1.0  1.0  1.000000     1.0  0.000000  0.000000
            power      0.0  0.0  0.000000     0.0  0.000000  0.000000
PoweredDown magic     -1.0  1.0  0.473780     1.0  0.586732  0.765984
            magnitude  1.0  1.0  1.000000     1.0  0.000000  0.000000
            power      1.0  2.0  1.152439     1.0  0.129994  0.360547

or:

pd.concat({'PoweredUp': PoweredUp, 'PoweredDown': PoweredDown}, axis=1)

output

          PoweredUp                                           PoweredDown                                          
                min  max      mean median       var       std         min  max      mean median       var       std
magic          -1.0  1.0  0.282669    0.8  0.659919  0.812354        -1.0  1.0  0.473780    1.0  0.586732  0.765984
magnitude       1.0  1.0  1.000000    1.0  0.000000  0.000000         1.0  1.0  1.000000    1.0  0.000000  0.000000
power           0.0  0.0  0.000000    0.0  0.000000  0.000000         1.0  2.0  1.152439    1.0  0.129994  0.360547

Or, with prefixes/suffixes:

pd.concat([PoweredUp.add_suffix('_up'), PoweredDown.add_suffix('_down')], axis=1)

output:

           min_up  max_up   mean_up  median_up    var_up    std_up  min_down  max_down  mean_down  median_down  var_down  std_down
magic        -1.0     1.0  0.282669        0.8  0.659919  0.812354      -1.0       1.0   0.473780          1.0  0.586732  0.765984
magnitude     1.0     1.0  1.000000        1.0  0.000000  0.000000       1.0       1.0   1.000000          1.0  0.000000  0.000000
power         0.0     0.0  0.000000        0.0  0.000000  0.000000       1.0       2.0   1.152439          1.0  0.129994  0.360547

CodePudding user response：

A way to do it (among several, depending on preferred details of output) is to create a new column named 'dataset' and concatenate the datasets:

PoweredUp = [
    [-1.0, 1.0, 0.282669, 0.8, 0.659919, 0.812354],
    [1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
]
PoweredDown = [
    [-1.0, 1.0, 0.473780, 1.0, 0.586732, 0.76598],
    [1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
    [1.0, 2.0, 1.152439, 1.0, 0.129994, 0.360547]
]
outputNames = ['magic', 'magnitude', 'power']
colNames = ['dataset', 'output', 'min', 'max', 'mean', 'median', 'var', 'std']
PoweredUp = pd.DataFrame(PoweredUp, index=outputNames, columns=colNames)
PoweredUp.insert(0, 'dataset', 'PoweredUp')
PoweredDown = pd.DataFrame(PoweredDown, index=outputNames, columns=colNames)
PoweredDown.insert(0, 'dataset', 'PoweredDown')
df = pd.concat([PoweredUp, PoweredDown])
print(df)

Output:

               dataset  min  max      mean  median       var       std
magic        PoweredUp -1.0  1.0  0.282669     0.8  0.659919  0.812354
magnitude    PoweredUp  1.0  1.0  1.000000     1.0  0.000000  0.000000
power        PoweredUp  0.0  0.0  0.000000     0.0  0.000000  0.000000
magic      PoweredDown -1.0  1.0  0.473780     1.0  0.586732  0.765980
magnitude  PoweredDown  1.0  1.0  1.000000     1.0  0.000000  0.000000
power      PoweredDown  1.0  2.0  1.152439     1.0  0.129994  0.360547

A second way to do it is to create both the 'dataset' column as well as an 'output' column and have a numerical index:

PoweredUp = pd.DataFrame([['PoweredUp', output]   row for output, row in zip(outputNames , PoweredUp)], columns=colNames)
PoweredDown = pd.DataFrame([['PoweredDown', output]   row for output, row in zip(outputNames , PoweredDown)], columns=colNames)
df = pd.concat([PoweredUp, PoweredDown], ignore_index=True)
print(df)

Output:

       dataset     output  min  max      mean  median       var       std
0    PoweredUp      magic -1.0  1.0  0.282669     0.8  0.659919  0.812354
1    PoweredUp  magnitude  1.0  1.0  1.000000     1.0  0.000000  0.000000
2    PoweredUp      power  0.0  0.0  0.000000     0.0  0.000000  0.000000
3  PoweredDown      magic -1.0  1.0  0.473780     1.0  0.586732  0.765980
4  PoweredDown  magnitude  1.0  1.0  1.000000     1.0  0.000000  0.000000
5  PoweredDown      power  1.0  2.0  1.152439     1.0  0.129994  0.360547