I am having a problem in pandas python which i think might be due to wrong use of groupby-CodePudding

I have my dataset looking like this:

 A    B    C    CompanyName   Sector    year
 4    9    3         d          10       2000 
 2    4    45        f          78       2001
 7   53    55        y          99       2000

I want to have it looking like this

 MeanA MeanB MeanC medianC   Sector  Year
 bla     bla   bla  bla        bla    bla
 bla     bla   bla  bla        bla    bla
 bla     bla   bla  bla        bla    bla
 bla     bla   bla  bla        bla    bla

So the first thing that came on my mind is to group by sector and year then use .agg() to calculate meanC medianC meanb meanA. But the problem is for meanC i noticed strange empty cells even though medianC exists so at least it should assume that value.

this is an example of code:

 Data=Data.groupby(['Sector','year']).agg({'A':'mean', 'B':'mean', "C":['mean', 'median']})

I think I am using the groupby function in a wrong way, any help will be appreciated

PS. my dataset contains about 120k rows going from 2000 to 2015 with multiple companies

CodePudding user response：

What are the dtype of each column? Are A and B and C all numeric, or can you convert them to int or float, or is your dataset dirty? If gropuby works for A and B, likely data quality is an issue if it suddenly fails for C.

As an aggregation function, you can directly call mean()

df.groupby['Sector', 'year'].mean()['C']

CodePudding user response：

The problem was due to a division by zero in column C therefore that particular column had -inf inf values that resulted in the empty cells in the groupby agg line of code. So thanks to the NaN cells in the groupby stage I discovered a lethal error. Thanks for your time all