Calculating averages dynamically with python-CodePudding

I have a large dataset with thousands of rows though fewer columns, i have ordered them by row values so that each of the 'objects' are grouped together, just like the dataset in Table1 below:

#Table1 :

data = [['ALFA', 351740.00, 0.31, 0.22, 0.44, 0.19, 0.05], 
        ['ALFA', 401740.00, 0.43, 0.26, 0.23, 0.16, 0.09], 
        ['ALFA', 892350.00, 0.58, 0.24, 0.05, 0.07, 0.4], 
        ['Bravo', 511830.00, 0.52, 0.16, 0.08, 0.26, 0], 
        ['Charlie', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06], 
        ['Charlie', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01], 
        ['Charlie', 590030.00, 0.75, 0.2, 0.29, 0.11, 0.04], 
        ['Charlie', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01], 
        ['Charlie', 401740.00, 0.43, 0.26, 0.14, 0.37, 0.06], 
        ['Charlie', 511830.00, 0.52, 0.16, 0.13, 0.22, 0.01], 
        ['Delta', 590030.00, 0.75, 0.2, 0.34, 0.3, 0], 
        ['Delta', 590030.00, 0.75, 0.2, 0, 0.28, 0], 
        ['Delta', 351740.00, 0.31, 0.22, 0.44, 0.19, 0.05], 
        ['Echo', 892350.00, 0.58, 0.24, 0.23, 0.16, 0.09], 
        ['Echo', 590030.00, 0.75, 0.2, 0.05, 0.07, 0.4], 
        ['Echo', 590030.00, 0.75, 0.2, 0.08, 0.26, 0], 
        ['Echo', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06], 
        ['Foxtrot', 401740.00, 0.43, 0.26, 0.27, 0.2, 0.01], 
        ['Foxtrot', 511830.00, 0.52, 0.16, 0.29, 0.11, 0.04], 
        ['Golf', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01], 
        ['Golf', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06], 
        ['Golf', 351740.00, 0.31, 0.22, 0.13, 0.22, 0.01], 
        ['Hotel', 892350.00, 0.58, 0.24, 0.34, 0.3, 0], 
        ['Hotel', 590030.00, 0.75, 0.2, 0, 0.28, 0], 
        ['Hotel', 590030.00, 0.75, 0.2, 0.29, 0.11, 0.04]]

df = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'])

df

However i would like to partition the data by these objects and get the averages for all the columns only in a separate table much like Table2 below:

#Table2: 

data2 = [['ALFA', 548610.00, 0.44, 0.24, 0.24, 0.14, 0.18], 
        ['Bravo', 511830.00, 0.52, 0.16, 0.08, 0.26, 0], 
        ['Charlie', 545615.00, 0.66, 0.20, 0.21, 0.25, 0.03], 
        ['Delta', 510600.00, 0.60, 0.21, 0.26, 0.26, 0.02], 
        ['Echo', 665610.00, 0.71, 0.21, 0.13, 0.22, 0.14], 
        ['Foxtrot', 456785.00, 0.48, 0.21, 0.28, 0.16, 0.03], 
        ['Golf', 510600.00, 0.60, 0.21, 0.18, 0.26, 0.03], 
        ['Hotel', 690803.33, 0.69, 0.21, 0.21, 0.23, 0.01]]

df2 = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'])

df2

Please note that the number of the objects vary across the dataset so the query would be able to count the number of objects and use that number to get the average of all the columns for each object and then present all these values in a new table.

For instance note that the '548610.00' values in Table2 for ALFA(column1) is merely an addition of Column1 values of ALFA in Table1 (351740.00 401740.00 401740.00) and divide by the count of ALFA being '3'.

CodePudding user response：

Just use the groupby() function from pandas:

df.groupby('Objects').mean()

Instead of mean() other functions like min() are possible as well.

CodePudding user response：

Can be done with:

df2 = pd.DataFrame(
    data, 
    columns=['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6']
).groupby(by='Objects').mean().round(decimals=2).reset_index()
print(df2)

Output:

   Objects    Column1  Column2  Column3  Column4  Column5  Column6
0     ALFA  548610.00     0.44     0.24     0.24     0.14     0.18
1    Bravo  511830.00     0.52     0.16     0.08     0.26     0.00
2  Charlie  545615.00     0.66     0.20     0.21     0.24     0.03
3    Delta  510600.00     0.60     0.21     0.26     0.26     0.02
4     Echo  665610.00     0.71     0.21     0.12     0.22     0.14
5  Foxtrot  456785.00     0.48     0.21     0.28     0.16     0.02
6     Golf  510600.00     0.60     0.21     0.18     0.26     0.03
7    Hotel  690803.33     0.69     0.21     0.21     0.23     0.01