I have an aggregation function which totals rows in a certain column based on an ID. After being able to correctly aggregate my rows, I wanted to select only the relevant columns, but I keep getting an error saying my ID column isn't found.
Full Code:
import pandas as pd
# initialize list of lists
data = [['A29', 112, 10, 0.3], ['A29',112, 15, 0.1], ['A29', 112, 14, 0.22], ['A29', 88, 33, 0.09], ['A29', 88, 29, 0.1], ['A29', 88, 6, 0.2]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Id', 'Cores', 'Provisioning', 'Utilization'])
df['total'] = df['Provisioning'] * df['Utilization']
df=df[['Id', 'Cores','total']]
aggregation_functions = {'Cores': 'first', 'total': 'sum'}
df_new = df.groupby(df['Id']).aggregate(aggregation_functions)
df_new['total1']=df_new['total']/3
print(df_new) #the dataframe contains the Id columns
print(df_new.columns) #doesn't print Id column
df_new=df_new[['Id', 'total1']] #Error: Id column not found
I'm not sure what is happening here. A line above, I print the dataframe and the Id column is present. However, when I try selecting it, it returns an error saying it isn't found. How can I fix this issue?
CodePudding user response:
You should use as_index=False
in the call to .groupby()
; the Id
column is part of the index, which prevents you from selecting it in the desired manner:
df_new = df.groupby(df['Id'], as_index=False).aggregate(aggregation_functions)