I really don't understand why I am getting an adress location output when creating a Dataframe with groupby for the 'courses'?
code:
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python","NA"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000,1500],
'Duration':['30days','50days','55days','40days','60days','35days','30days','50days','40days'],
'Discount':[1000,2300,1000,1200,2500,None,1400,1600,0]
})
df = pd.DataFrame(technologies)
print(df)
df2 =df.groupby(['Courses'])
print(df2)
OutPut:
Courses Fee Duration Discount
0 Spark 22000 30days 1000.0
1 PySpark 25000 50days 2300.0
2 Hadoop 23000 55days 1000.0
3 Python 24000 40days 1200.0
4 Pandas 26000 60days 2500.0
5 Hadoop 25000 35days NaN
6 Spark 25000 30days 1400.0
7 Python 22000 50days 1600.0
8 NA 1500 40days 0.0
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000290E76C40A0>
CodePudding user response:
provide an aggregate function to perform some calculation on what you are grouping. Look at the examples here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
df.groupby(['Courses']).size()
Courses
Hadoop 2
NA 1
Pandas 1
PySpark 1
Python 2
Spark 2
dtype: int64
CodePudding user response:
the groupby object is stored at a certain address in memory. It does not show anything until you apply a function to that object: aggregation, max, mean etc. You can iterate through the grouped object and print each element. You will see which rows are associated with the grouped value.
I hope this helps.
CodePudding user response:
This is probably because groupby
should be followed by an operation like mean
, max
, etc.
See this example taken from here
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
'Parrot', 'Parrot'],
'Max Speed': [380., 370., 24., 26.]})
df
df.groupby(['Animal']).mean()