I have the following dataframe (with thousands of rows, here's just a sample):
cluster_id bin_time trial_time spikes
1 0 0.5 1
2 5 0.7 0
6 0 0.5 5
3 5 0.7 0
2 5 0.5 1
8. 0. 0.7. 0
First three columns have some duplicated values. I want to group spikes (last column), so that for every unique trial_time and unique value of bin_time I get the sum of spikes. Spikes with different cluster_id should just be binned together according to which trial and bin_time they belong to (counted in a sum, this dimension can disappear).
In the end I am trying to get a matrix that has coordinates of a unique trial_time on Y axis, and unique bin_time on X axis, and each entry is a sum of spikes (so I don't care about the differentiation based on 'cluster_id').
Here's what I tried:
new_df = groupby('trial_time')['spikes']apply(lambda x: list(x))
but that still gives me for every trial_time too many duplicated values based on cluster_id...
Would greatly appreciate any help! I'm very stuck.
For the small example data I gave I would like an output of 2D np array:
trial 0.5, bin time 0 --> count of total 1 5=6 spikes
trial 0.5, bin time 5 --> count of total 1 spike
trial 0.7, bin time 0 --> count of 0 spikes
trial 0.7, bin time 5 --> count of 0 0 = 0
So ideally, a 2d array with just the spike numbers.
CodePudding user response:
Group by the two columns and aggregate by summing. Don't use the first column when grouping since you don't care about it.
qf = df.iloc[:,1:].groupby(['bin_time','trial_time']).agg(sum)
#qf = df.loc[:,['bin_time','trial_time','spikes']].groupby(['trial_time','bin_time']).agg(sum)
trial_time bin_time
0.5 0.0 6
5.0 1
0.7 5.0 0
0.7. 0.0 0