Home > Software engineering >  Grouping based on unique values of multiple panda columns
Grouping based on unique values of multiple panda columns

Time:07-11

I have the following dataframe (with thousands of rows, here's just a sample):

cluster_id     bin_time     trial_time           spikes
1               0               0.5                1
2               5               0.7                0 
6               0               0.5                5
3               5               0.7                0
2               5               0.5                1
8.              0.              0.7.               0

First three columns have some duplicated values. I want to group spikes (last column), so that for every unique trial_time and unique value of bin_time I get the sum of spikes. Spikes with different cluster_id should just be binned together according to which trial and bin_time they belong to (counted in a sum, this dimension can disappear).

In the end I am trying to get a matrix that has coordinates of a unique trial_time on Y axis, and unique bin_time on X axis, and each entry is a sum of spikes (so I don't care about the differentiation based on 'cluster_id').

Here's what I tried:

new_df = groupby('trial_time')['spikes']apply(lambda x: list(x))

but that still gives me for every trial_time too many duplicated values based on cluster_id...

Would greatly appreciate any help! I'm very stuck.

For the small example data I gave I would like an output of 2D np array:

trial 0.5, bin time 0 --> count of total 1 5=6 spikes
trial 0.5, bin time 5 --> count of total 1 spike
trial 0.7, bin time 0 --> count of 0 spikes
trial 0.7, bin time 5 --> count of 0 0 = 0

So ideally, a 2d array with just the spike numbers.

CodePudding user response:

Group by the two columns and aggregate by summing. Don't use the first column when grouping since you don't care about it.

qf = df.iloc[:,1:].groupby(['bin_time','trial_time']).agg(sum)
#qf = df.loc[:,['bin_time','trial_time','spikes']].groupby(['trial_time','bin_time']).agg(sum)

trial_time bin_time
0.5        0.0            6
           5.0            1
0.7        5.0            0
0.7.       0.0            0
  • Related