how to aggregate columns based on the value of others-CodePudding

If i had a dataframe such as this, how would i create aggragtes such as min,max and mean for each Port for each given year?

df1 = pd.DataFrame({'Year': {0: 2019, 1: 2019, 2: 2019, 3: 2019, 4:2019},'Port': {0: 'NORTH SHIELDS', 1: 'NORTH SHIELDS'  2: 'NORTH SHIELDS', 3: 'NORTH SHIELDS', 4: 'NORTH SHIELDS'},'Vessel capacity units': {0: 760.5, 1: 760.5, 2: 760.5, 3: 760.5, 4: 760.5},'Engine power': {0: 790.0, 1: 790.0, 2: 790.0, 3: 790.0, 4: 790.0},'Registered tonnage': {0: 516.0, 1: 516.0, 2: 516.0, 3: 516.0, 4: 516.0},'Overall length': {0: 45.0, 1: 45.0, 2: 45.0, 3: 45.0, 4: 45.0},'Value(£)': {0: 2675.81, 1: 62.98, 2: 9.67, 3: 527.02, 4: 2079.0}, 'Landed Weight (tonnes)': {0: 0.978,1: 0.0135, 2: 0.001, 3: 0.3198, 4: 3.832}})

df1

CodePudding user response：

IIUC

df.groupby(['PORT', 'YEAR'])['<WHATEVER COLUMN HERE>'].agg(['count', 'min', 'max', 'mean']) #groupys by 'PORT', 'YEAR' and finds the multiple arguments of count, min, max, and mean

CodePudding user response：

Without any kind of background information this questions is tricky. Would you want it for every year or just some given years?

To extract min/max/mean etc is quite straightforward. I assume that you have some kind of datafile and have extracted a df from there:

file = 'my-data.csv'  # the data file
df = pd.read_csv(file)

VALUE_I_WANT_TO_EXTRAXT = df['Column name']

Then for each port you can extract the min/max/mean data like this.

for i in range(Port):
    print( i, np.min(VALUE_I_WANT_TO_EXTRAXT) )

But, as I said. Without any kind of specifik knowledge about the problem it is hard to provide a solution