I have a dataframe like as below
df = pd.DataFrame({'subject_id':[1,1,1,1,1,2,2,2,2,2],
'qtr_info' :['2017Q1','2017Q3','2017Q4','2018Q1','2018Q4','2017Q1','2017Q4','2018Q2','2018Q4','2019Q1'],
'val' :[5,5,5,5,1,6,5,5,8,3],
'Prod_id':['A','B','C','A','E','Q','G','F','G','H']})
I would like to do the below
a) Fill all the missing quarters of a subject
b) fillna for other columns using the mean value
for respective columns (for the same subject). Don't refer other subject ids for computing mean value
So, I tried the below
df_time.resample('Q').mean() #didn't work
df_time.groupby('subject_id').resample('Q).mean() #didn't work
But I got the below error
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
I expect my output to be like as below for subject_id = 1. Here yellow color shows newly inserted rows (to fill the missing gaps)
CodePudding user response:
Create PeriodIndex
first and for resample use Resampler.agg
with dictionary, for replace missing values by means use Series.fillna
with GroupBy.transform
:
df_time['qtr_info'] = pd.PeriodIndex(df_time['qtr_info'], freq='Q')
df = (df_time.set_index('qtr_info')
.groupby('subject_id')
.resample('Q')
.agg({'val':'mean', 'Prod_id':'ffill'}))
df['val'] = df['val'].fillna(df.groupby(level=0)['val'].transform('mean'))
print (df)
val Prod_id
subject_id qtr_info
1 2017Q1 5.0 A
2017Q2 4.2 A
2017Q3 5.0 B
2017Q4 5.0 C
2018Q1 5.0 A
2018Q2 4.2 A
2018Q3 4.2 A
2018Q4 1.0 E
2 2017Q1 6.0 Q
2017Q2 5.4 Q
2017Q3 5.4 Q
2017Q4 5.0 G
2018Q1 5.4 G
2018Q2 5.0 F
2018Q3 5.4 F
2018Q4 8.0 G
2019Q1 3.0 H