Home > database >  Pandas .groupby is returning an address
Pandas .groupby is returning an address

Time:10-23

I really don't understand why I am getting an adress location output when creating a Dataframe with groupby for the 'courses'?

code:

import pandas as pd
technologies   = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python","NA"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000,1500],
    'Duration':['30days','50days','55days','40days','60days','35days','30days','50days','40days'],
    'Discount':[1000,2300,1000,1200,2500,None,1400,1600,0]
          })
df = pd.DataFrame(technologies)
print(df)

df2 =df.groupby(['Courses'])
print(df2)

OutPut:

  Courses    Fee Duration  Discount
0    Spark  22000   30days    1000.0
1  PySpark  25000   50days    2300.0
2   Hadoop  23000   55days    1000.0
3   Python  24000   40days    1200.0
4   Pandas  26000   60days    2500.0
5   Hadoop  25000   35days       NaN
6    Spark  25000   30days    1400.0
7   Python  22000   50days    1600.0
8       NA   1500   40days       0.0
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000290E76C40A0>

CodePudding user response:

provide an aggregate function to perform some calculation on what you are grouping. Look at the examples here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

df.groupby(['Courses']).size()
Courses
Hadoop     2
NA         1
Pandas     1
PySpark    1
Python     2
Spark      2
dtype: int64

CodePudding user response:

the groupby object is stored at a certain address in memory. It does not show anything until you apply a function to that object: aggregation, max, mean etc. You can iterate through the grouped object and print each element. You will see which rows are associated with the grouped value.

I hope this helps.

CodePudding user response:

This is probably because groupby should be followed by an operation like mean, max, etc.

See this example taken from here

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                          'Parrot', 'Parrot'],
               'Max Speed': [380., 370., 24., 26.]})
df

df.groupby(['Animal']).mean()
  • Related