I have a DataFrame that is similar this:
id = ['A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C','C','C',]
time = ['2021-05-02 01:00:00','2021-05-02 02:00:00','2021-05-02 03:00:00','2021-05-02 04:00:00',
'2021-05-02 05:00:00','2021-05-02 06:00:00','2021-05-02 07:00:00','2021-05-02 08:00:00',
'2021-05-02 09:00:00','2021-05-02 10:00:00','2021-05-02 11:00:00','2021-05-02 12:00:00',
'2021-05-02 01:00:00','2021-05-02 02:00:00','2021-05-02 04:00:00','2021-05-02 05:00:00',
'2021-05-02 08:00:00','2021-05-02 09:00:00','2021-05-02 01:00:00','2021-05-02 02:00:00',
'2021-05-02 04:00:00','2021-05-02 05:00:00',
'2021-05-02 08:00:00','2021-05-02 10:00:00']
in_count = [1,1,1,2,1,1,2,5,1,2,1,1,1,2,2,3,1,1,2,1,1,1,2,1]
out_count =[1,1,1,1,1,2,1,1,1,3,1,1,2,2,1,1,2,1,1,2,1,1,2,1]
in_distance = [12,12,14,12,10,8,12,10,12,12,13,12,12,12,11,18,13,12,20,21,15,12,12,21]
out_distance = [10,10,10,11,11,21,12,14,12,13,13,13,22,21,13,12,21,11,11,21,21,11,11,21]
d = {'id': id, 'time': time, 'in_count':in_count,'out_count':out_count,'in_dist':in_distance,'out_dist':out_distance}
df = pd.DataFrame(d)
df['time'] = pd.to_datetime(df['time'], format = '%Y-%m-%d %H:%M:%S')
df = df.pivot(index='id', columns='time', values=['in_count', 'out_count','out_dist','in_dist'])
df
I need to replace the NaN values in the in_dist columns with the mean of the values from the other in_dist columns for that specific ID.
i.e the in_dist NaNs for ID B would be the mean of 11,18,13,12 (ignoring the in_count values, out_dist values etc for ID B)
CodePudding user response:
Use slicing and fillna
:
df['in_dist'] = df['in_dist'].T.fillna(df['in_dist'].mean(axis=1)).T
As fillna
does not support a Series on axis=1 we can use a double transposition.