pandas calculating median values based on the same time stamps-CodePudding

I'd like to calculate median value of data based on the same timestamp with Pandas.

An example of my partial dataframe looks like this

timestamp	data
8/2/2021 16:11:34	30
8/2/2021 16:11:34	32
8/2/2021 16:11:34	50
10/12/2021 19:25:17	40
10/12/2021 19:25:17	50
10/12/2021 19:25:17	60

I'd like this df to look like this:

timestamp	data	median
8/2/2021 16:11:34	30	32
8/2/2021 16:11:34	32	32
8/2/2021 16:11:34	50	32
10/12/2021 19:25:17	40	50
10/12/2021 19:25:17	50	50
10/12/2021 19:25:17	60	50

I was thinking of creating a new column and create a for loop to collect data of same timestamp into an array then calculate, but also thought that it might take too long to calculate (data record around 30,000) looked for easy solution but couldn't find a solution. Any suggestion would be very much appreciated.

CodePudding user response：

One option is to call median in groupby.transform. It returns a Series having the same indices as df filled with the transformed median values for each group.

df['median'] = df.groupby('timestamp').transform('median')

Output:

             timestamp  data  median
0    8/2/2021 16:11:34    30    32.0
1    8/2/2021 16:11:34    32    32.0
2    8/2/2021 16:11:34    50    32.0
3  10/12/2021 19:25:17    40    50.0
4  10/12/2021 19:25:17    50    50.0
5  10/12/2021 19:25:17    60    50.0