Grouping based on unique values of multiple panda columns-CodePudding

I have the following dataframe (with thousands of rows, here's just a sample):

cluster_id     bin_time     trial_time           spikes
1               0               0.5                1
2               5               0.7                0 
6               0               0.5                5
3               5               0.7                0
2               5               0.5                1
8.              0.              0.7.               0

First three columns have some duplicated values. I want to group spikes (last column), so that for every unique trial_time and unique value of bin_time I get the sum of spikes. Spikes with different cluster_id should just be binned together according to which trial and bin_time they belong to (counted in a sum, this dimension can disappear).

In the end I am trying to get a matrix that has coordinates of a unique trial_time on Y axis, and unique bin_time on X axis, and each entry is a sum of spikes (so I don't care about the differentiation based on 'cluster_id').

Here's what I tried:

new_df = groupby('trial_time')['spikes']apply(lambda x: list(x))

but that still gives me for every trial_time too many duplicated values based on cluster_id...

Would greatly appreciate any help! I'm very stuck.

For the small example data I gave I would like an output of 2D np array:

trial 0.5, bin time 0 --> count of total 1 5=6 spikes
trial 0.5, bin time 5 --> count of total 1 spike
trial 0.7, bin time 0 --> count of 0 spikes
trial 0.7, bin time 5 --> count of 0 0 = 0

So ideally, a 2d array with just the spike numbers.

CodePudding user response：

Group by the two columns and aggregate by summing. Don't use the first column when grouping since you don't care about it.

qf = df.iloc[:,1:].groupby(['bin_time','trial_time']).agg(sum)
#qf = df.loc[:,['bin_time','trial_time','spikes']].groupby(['trial_time','bin_time']).agg(sum)

trial_time bin_time
0.5        0.0            6
           5.0            1
0.7        5.0            0
0.7.       0.0            0