For this dataset, i want to find the sum of Value(£) for each combination of the three columns together for Year, Length Group and Port of Landing. So for example, one sum value will be for the year 2016, the Length group 10m&Under and the Port of Landing Aberdaran.
CodePudding user response:
Given the response you have back to @berkayln, I think you want to project that column back to your original dataframe... Does this suit your need ?
df['sumPerYearLengthGroupPortOfLanding']=df.groupby(['Year','Length Group','Port of Landing'])['Value(£)'].transform(lambda x: x.sum())
CodePudding user response:
You can try this one:
dataframe.groupby(['Year','Length Group','Port of Landing'])['Value(£)'].sum()
That should work.
CodePudding user response:
You can use pd.DataFrame.groupby
to aggregate the data.
# Change the order if you want a different hierarchy
grp_cols = ["Year", "Length Group", "Port of Landing"]
df.groupby(grp_cols)["Value(£)"].sum()
You can also do them one-by-one as such:
for col in grp_cols:
df.groupby(col)["Value(£)"].sum()
You can also use .loc
to get 2016 only.
df.loc[df.Year == 2016]["Value(£)"].sum()
The pd.DataFrame.groupby
functionality allows you to aggregate using other functions other than .sum
, including customized functions that operate on the sub-dataframes.