I have my dataset looking like this:
A B C CompanyName Sector year
4 9 3 d 10 2000
2 4 45 f 78 2001
7 53 55 y 99 2000
I want to have it looking like this
MeanA MeanB MeanC medianC Sector Year
bla bla bla bla bla bla
bla bla bla bla bla bla
bla bla bla bla bla bla
bla bla bla bla bla bla
So the first thing that came on my mind is to group by sector and year then use .agg() to calculate meanC medianC meanb meanA. But the problem is for meanC i noticed strange empty cells even though medianC exists so at least it should assume that value.
this is an example of code:
Data=Data.groupby(['Sector','year']).agg({'A':'mean', 'B':'mean', "C":['mean', 'median']})
I think I am using the groupby function in a wrong way, any help will be appreciated
PS. my dataset contains about 120k rows going from 2000 to 2015 with multiple companies
CodePudding user response:
What are the dtype
of each column? Are A
and B
and C
all numeric, or can you convert them to int
or float
, or is your dataset dirty? If gropuby
works for A
and B
, likely data quality is an issue if it suddenly fails for C
.
As an aggregation function, you can directly call mean()
df.groupby['Sector', 'year'].mean()['C']
CodePudding user response:
The problem was due to a division by zero in column C therefore that particular column had -inf inf values that resulted in the empty cells in the groupby agg line of code. So thanks to the NaN cells in the groupby stage I discovered a lethal error. Thanks for your time all