I have a large dataset with thousands of rows though fewer columns, i have ordered them by row values so that each of the 'objects' are grouped together, just like the dataset in Table1 below:
#Table1 :
data = [['ALFA', 351740.00, 0.31, 0.22, 0.44, 0.19, 0.05],
['ALFA', 401740.00, 0.43, 0.26, 0.23, 0.16, 0.09],
['ALFA', 892350.00, 0.58, 0.24, 0.05, 0.07, 0.4],
['Bravo', 511830.00, 0.52, 0.16, 0.08, 0.26, 0],
['Charlie', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06],
['Charlie', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01],
['Charlie', 590030.00, 0.75, 0.2, 0.29, 0.11, 0.04],
['Charlie', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01],
['Charlie', 401740.00, 0.43, 0.26, 0.14, 0.37, 0.06],
['Charlie', 511830.00, 0.52, 0.16, 0.13, 0.22, 0.01],
['Delta', 590030.00, 0.75, 0.2, 0.34, 0.3, 0],
['Delta', 590030.00, 0.75, 0.2, 0, 0.28, 0],
['Delta', 351740.00, 0.31, 0.22, 0.44, 0.19, 0.05],
['Echo', 892350.00, 0.58, 0.24, 0.23, 0.16, 0.09],
['Echo', 590030.00, 0.75, 0.2, 0.05, 0.07, 0.4],
['Echo', 590030.00, 0.75, 0.2, 0.08, 0.26, 0],
['Echo', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06],
['Foxtrot', 401740.00, 0.43, 0.26, 0.27, 0.2, 0.01],
['Foxtrot', 511830.00, 0.52, 0.16, 0.29, 0.11, 0.04],
['Golf', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01],
['Golf', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06],
['Golf', 351740.00, 0.31, 0.22, 0.13, 0.22, 0.01],
['Hotel', 892350.00, 0.58, 0.24, 0.34, 0.3, 0],
['Hotel', 590030.00, 0.75, 0.2, 0, 0.28, 0],
['Hotel', 590030.00, 0.75, 0.2, 0.29, 0.11, 0.04]]
df = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'])
df
However i would like to partition the data by these objects and get the averages for all the columns only in a separate table much like Table2 below:
#Table2:
data2 = [['ALFA', 548610.00, 0.44, 0.24, 0.24, 0.14, 0.18],
['Bravo', 511830.00, 0.52, 0.16, 0.08, 0.26, 0],
['Charlie', 545615.00, 0.66, 0.20, 0.21, 0.25, 0.03],
['Delta', 510600.00, 0.60, 0.21, 0.26, 0.26, 0.02],
['Echo', 665610.00, 0.71, 0.21, 0.13, 0.22, 0.14],
['Foxtrot', 456785.00, 0.48, 0.21, 0.28, 0.16, 0.03],
['Golf', 510600.00, 0.60, 0.21, 0.18, 0.26, 0.03],
['Hotel', 690803.33, 0.69, 0.21, 0.21, 0.23, 0.01]]
df2 = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'])
df2
Please note that the number of the objects vary across the dataset so the query would be able to count the number of objects and use that number to get the average of all the columns for each object and then present all these values in a new table.
For instance note that the '548610.00' values in Table2 for ALFA(column1) is merely an addition of Column1 values of ALFA in Table1 (351740.00 401740.00 401740.00) and divide by the count of ALFA being '3'.
CodePudding user response:
Just use the groupby()
function from pandas:
df.groupby('Objects').mean()
Instead of mean()
other functions like min()
are possible as well.
CodePudding user response:
Can be done with:
df2 = pd.DataFrame(
data,
columns=['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6']
).groupby(by='Objects').mean().round(decimals=2).reset_index()
print(df2)
Output:
Objects Column1 Column2 Column3 Column4 Column5 Column6
0 ALFA 548610.00 0.44 0.24 0.24 0.14 0.18
1 Bravo 511830.00 0.52 0.16 0.08 0.26 0.00
2 Charlie 545615.00 0.66 0.20 0.21 0.24 0.03
3 Delta 510600.00 0.60 0.21 0.26 0.26 0.02
4 Echo 665610.00 0.71 0.21 0.12 0.22 0.14
5 Foxtrot 456785.00 0.48 0.21 0.28 0.16 0.02
6 Golf 510600.00 0.60 0.21 0.18 0.26 0.03
7 Hotel 690803.33 0.69 0.21 0.21 0.23 0.01